Thursday, March 17, 2011

Techniques of calling unmanaged code from .NET and their speed

Default technology for calling unmanaged code (for example C++ code) from .NET is Platform Invoke. It is available from every managed language which supports method attributes through DllImportAttribute. Usage of this attribute is shown below in C#:
static extern void TestCpp(float a, float b, float* result);
Platform Invoke is complex technology which ensures all necessary conversions of data structures from managed to unmanaged world and vice versa. You can use DllImportAttribute's named parameters or other attributes like InAttribute or OutAttribute. These options provide you possibility to more precisely control the invocation of native methods like custom marshaling or specifying calling conventions. Here is very good article about using PInvoke.
If you try to test PInvoke performance you will probably figure out that it's not so good. When we often call unmanaged code from our application we should think about amount of context switching from managed to unmanaged code because it is performance bottleneck. That's mainly due to security checks which .Net runtime performs before each call of unmanaged function. It goes through call stack and checks if every caller has appropriate rights. We can suppress this kind of behaviour by using SuppressUnmanagedCodeSecurity attribute. This can really help us improve performance when we need to do a lot of context switching.

Next technique is usage of GetDelegateForFunctionPointer method defined in Marshal class. In our unmanaged library we simply provide pointer to function we want to call and in our managed application we will create a delegate for this pointer by GetDelegateForFunctionPointer method. There is a same performance problem with call stack security checks in this solution. We can suppress this by use of SuppressUnmanagedCodeSecurity attribute on definition of delegate type we will use for our unmanaged function. In the end we will see this method of calling unmanaged code is not so effective as PInvoke. If we need to setup calling convention of this we just need to use UnmanagedFunctionPointer attribute. The example below shows usage of SuppressUnmanagedCodeSecurity and UnmanagedFunctionPointer attributes:
delegate void MyUnmanagedDelegate(float a, float b, float* result);
Last but not least technique is to use C++/CLI to create layer between managed and unmanaged code. We just need to write managed type which will provide our unmanaged functionality. That's quite easy because C++/CLI enables us to mix managed and unmanaged code. Simple C++/CLI code providing this functionality can look like this:
#pragma unmanaged
void unmanagedFunction(float a, float b, float* result)
 float tmp = a - b;
 *result = a * tmp + b * tmp;
#pragma managed
using namespace System;
namespace cliLib 
 public ref class cliClass
  static void CallUnmanagedFunction(float a, float b, float* result)
   unmanagedFunction(a, b, result);
The last method I will write about is kinda special. I discovered it a few weeks ago when I was working on library for speeding up math operations in XNA. It is based on magic calli instruction of MSIL language. Calli instruction stands for indirect method call. And what is the magic there? Calli instruction takes as main argument pointer to machine code which will start executing. That means we can give it pointer to unamanaged function. There's just one problem. When we try it, it doesn't work. It is due to calling convention which CLR uses. CLR uses fastcall calling convention what means arguments of function are passed into registers when possible. But it has simple solution. We just need to specify fastcall calling convention for our unmanaged function. After that we can write small MSIL library or use Reflection Emit to call our unmanaged function.

Here is a little performance test of mentioned techniques:
C# (informative)4318 ms
PInvoke - suppressed security5415 ms
Calli instruction5505 ms
C++/CLI6311 ms
Function delegate - suppressed security7788 ms
PInvoke8249 ms
Function delegate11594ms

I was testing simple code with float arithmetic in 67108864 iterations.