Sunday, March 20, 2011

Speeding up sigmoid function by approximating exponential function

I will talk about sigmoid function defined by formula
where x is real and k is positive constant. Its graph is for k=4 shown bellow.

This function can be used for example as activation function in neural networks. When we use it as activation function in neural network a code evaluating it will be very stressed. So our approach should be to optimize it.
For simplicity we will assume k=4 for rest of this post. Classical implementation in C# may look like this simple code:
private static float Sigmoid(float signal)
{
    return 1f / (1f + (float)Math.Exp(-4f * signal));
}
Unfortunately there're not any single precision Math class in .Net. If we need only single precision values it's inefficient to calculate all values in double precision. The first idea how to solve this problem could be usage of C language where are single precision mathematical functions. But how it was shown in last post about calling unmanaged code from .Net it wouldn't be the best solution due to high frequency of calling unmanaged code.
The slowest part of this small code is of course evaluation of exponential function. In lot of situations  we do not need so precise results and we can use some kind of approximation. This is also a case of neural networks mentioned before. We will base our approximation on well known formula
We can take as an approximation of e^x formula (1+x/n)^n for some finite n. The higher n we select the closer approximation we will get. On the graph below we can see sigmoid function with approximated exponential for n=16.

There's again one small problem which we can see on next image.

As we can see the problem with approximated sigmoid function is that its limit for x goes to infinity is 0 but limit of sigmoid function for x goes to infinity is 1. We can see the problem is solvable by one simple if-statement. We find out that local maximum of approximated sigmoid function is at x=4.0. Now we just say that approximated function for x>4.0 will be 1. The code for approximated sigmoid function can look like this:
private static float Sigmoid(float signal)
{
    if (signal >= 4f) return 1f;
    float tmp = 1f - 0.25f * signal;
    tmp *= tmp;
    tmp *= tmp;
    tmp *= tmp;
    tmp *= tmp;
    return 1f / (1f + tmp);
}
This code looks more complex than original version but it's approximately 78% faster. I was testing the speed of both methods with numbers from interval (-0.5, 0.5). If we try some interval which contains numbers bigger than our boundary value 4.0 it should be even faster.

4 comments:

  1. Nice article! However, it contains an error. The maximum of that function is at 4, not 3.60672.

    Proof: the first derivative is (4*(1-x/4)^15)/((1-x/4)^16+1)^2, which is zero at the local maximum. Solving this equation gives x=4. You can verify that by substituting 4 for x, which gives zero. This also means clipping at 4 ensures derivatives are also continuous.

    ReplyDelete
  2. Yes, I'm sorry for that. Of course you are right. I was computing it in Mathematica 8 and it gave me incorrect result 3.60672 and I didn't check it (even though it's obvious). I will fix it in the post. Thanks for comment.

    ReplyDelete
  3. Hello,

    Something I've wondered about for some time is what happens to units once we pump them into a exponential function or log function.

    For example in neutron attenuation

    I(x) = I_0 * exp(-Sigma * x)

    I feel like this something I should know, but I just don't get it.

    I suspect what's happening is that the units stay the same since basically all the function is doing is changing the magnitude of the scalar quantity associated with the unit, and not necessarily what the unit measures.

    If that's so is there a word I can use to describe a function that does change a inputs unit (such as the aformentioned) and one that does not?

    ReplyDelete
  4. Hi.
    What is the name of this approximation method?
    Thanks.

    ReplyDelete