Thursday, March 17, 2011

Techniques of calling unmanaged code from .NET and their speed

Default technology for calling unmanaged code (for example C++ code) from .NET is Platform Invoke. It is available from every managed language which supports method attributes through DllImportAttribute. Usage of this attribute is shown below in C#:
static extern void TestCpp(float a, float b, float* result);
Platform Invoke is complex technology which ensures all necessary conversions of data structures from managed to unmanaged world and vice versa. You can use DllImportAttribute's named parameters or other attributes like InAttribute or OutAttribute. These options provide you possibility to more precisely control the invocation of native methods like custom marshaling or specifying calling conventions. Here is very good article about using PInvoke.
If you try to test PInvoke performance you will probably figure out that it's not so good. When we often call unmanaged code from our application we should think about amount of context switching from managed to unmanaged code because it is performance bottleneck. That's mainly due to security checks which .Net runtime performs before each call of unmanaged function. It goes through call stack and checks if every caller has appropriate rights. We can suppress this kind of behaviour by using SuppressUnmanagedCodeSecurity attribute. This can really help us improve performance when we need to do a lot of context switching.

Next technique is usage of GetDelegateForFunctionPointer method defined in Marshal class. In our unmanaged library we simply provide pointer to function we want to call and in our managed application we will create a delegate for this pointer by GetDelegateForFunctionPointer method. There is a same performance problem with call stack security checks in this solution. We can suppress this by use of SuppressUnmanagedCodeSecurity attribute on definition of delegate type we will use for our unmanaged function. In the end we will see this method of calling unmanaged code is not so effective as PInvoke. If we need to setup calling convention of this we just need to use UnmanagedFunctionPointer attribute. The example below shows usage of SuppressUnmanagedCodeSecurity and UnmanagedFunctionPointer attributes:
delegate void MyUnmanagedDelegate(float a, float b, float* result);
Last but not least technique is to use C++/CLI to create layer between managed and unmanaged code. We just need to write managed type which will provide our unmanaged functionality. That's quite easy because C++/CLI enables us to mix managed and unmanaged code. Simple C++/CLI code providing this functionality can look like this:
#pragma unmanaged
void unmanagedFunction(float a, float b, float* result)
 float tmp = a - b;
 *result = a * tmp + b * tmp;
#pragma managed
using namespace System;
namespace cliLib 
 public ref class cliClass
  static void CallUnmanagedFunction(float a, float b, float* result)
   unmanagedFunction(a, b, result);
The last method I will write about is kinda special. I discovered it a few weeks ago when I was working on library for speeding up math operations in XNA. It is based on magic calli instruction of MSIL language. Calli instruction stands for indirect method call. And what is the magic there? Calli instruction takes as main argument pointer to machine code which will start executing. That means we can give it pointer to unamanaged function. There's just one problem. When we try it, it doesn't work. It is due to calling convention which CLR uses. CLR uses fastcall calling convention what means arguments of function are passed into registers when possible. But it has simple solution. We just need to specify fastcall calling convention for our unmanaged function. After that we can write small MSIL library or use Reflection Emit to call our unmanaged function.

Here is a little performance test of mentioned techniques:
C# (informative)4318 ms
PInvoke - suppressed security5415 ms
Calli instruction5505 ms
C++/CLI6311 ms
Function delegate - suppressed security7788 ms
PInvoke8249 ms
Function delegate11594ms

I was testing simple code with float arithmetic in 67108864 iterations.


  1. Hi! Good post.
    How you do performance tests?
    I'm interested in function delegates method - you obtain delegate via GetDelegateForFunctionPointer and call this delegate, or you obtain delegate at each iteration?
    May be you have sample code?

  2. Hi, no I create the delagete via GetDelegateForFunctionPointer just once and than I call this delegate in simple for cycle. Firstly there's no reason to create new instance of delegate for function pointer each time and secondly it would be very inefficient.
    Thanks for reading.

  3. Hi, thanks, understood.

    And least one question - why you wrote that calli uses fastcall convention? When i'm emiting calli instruction i can choose any calling convention...

  4. Hmm, interesting.. I wasn't using reflection emit (now I see there's an option) but I was using simple library written in MSIL and I didn't realize i can somehow change the convention. Have you tried it with some other convention than fastcall? I'm interested if it works, but don't have time to test it now.

  5. I'm trying stdcall, and it works for me.
    When i'm trying fastcall - i got exception, that unmanaged code can be called only by stdcall, cdecl or thiscall.
    And my timings (all with supressed unmanaged code security) - calli twice slower than p/invoke, and delegates twice slower than calli.
    So actually i don't know where i'm wrong. :)

  6. Oh. When i use Reflection.Emit - i specified unmanaged fastcall which is not supported. In fact we need Emit managed calli with standard callconv, which internally fastcall. So great post. Thanks. :)

  7. My results:

    MCall - it is calli standard call (and fastcall at unmanaged side)
    SCall - it is unmanaged stdcall

    SCall Calli - it is same technique as mcall, but use 'calli unmanaged stdcall ...' construction.

    OpMinus2 - two int32 args
    OpMinus4 - four int32 args

    this results on AMD64, x64 but with x86 assemblies.

    all tests do for loop count 100 000 000.

    C# (informative): 332ms 1,00x
    MCall: 448ms 0,74x
    SCall Calli: 1309ms 0,25x
    SCall PInvoke: 658ms 0,50x

    C# (informative): 474ms 1,00x
    MCall: 736ms 0,64x
    SCall Calli: 1516ms 0,31x
    SCall PInvoke: 777ms 0,61x

  8. Hi, Very nice post.

    From my own experience. Explict PInvoke is the best performer. When I wrote my C# Wrapper Generator for C++ DLL, I thought about using calli to call the native C++ methods, but the performance was not always good and it thrown exceptions some time. So I choose to stick to Explict PInvoke, pure and clean, it is the king among all different kinds of ways of calling native methods.

  9. Is the code for this post not available anywhere?