Wednesday, December 20, 2006

In a similar theme to my last post, i'm going to harp on about a much more useful and not so obvious performance boost i made:

Firstly, consider the following two method signatures:
M1: public static Matrix Add(Matrix matrix1, Matrix matrix2)
M2: public static void Add(ref Matrix matrix1, ref Matrix matrix2, out Matrix result)

From first appearances, which method would you expect to be faster? And why? I'll come back to that in a minute.

Now, the way the code was initially written is that all the logic to add two matrices was contained in M1. So if you called M2, internally M2 would just call M1 and return the result to you. That's fine. It's definitely a good thing to not write the same code in 2 or more methods! This code resulted in the following performance characteristics:
M1 called 5,000,000 times: 500ms
M2 called 5,000,000 times: 750ms

As you can see, contrary to what you might think, M1 was performing significantly faster than M2! Now, the reason why you would expect M2 to perform *much* faster is that passing by ref means that the Matrix struct will not be copied into a new stack frame. Instead a reference to it will be there (which is significantly smaller in size, therefore faster to create). However what was happening is that a reference was being passed to M2, then M1 passed a new copy of the matrix to M1. This meant that all the benefits of using ref were lost.

So, my quick change was to move all the logic to "Add" two matrices into the Matrix.Add(ref,ref,out) method. That way whenever anyone called M2, there would be no copying of structs done, thus potentially making the method much faster. Whenever M1 is called, it then calls M2 internally to do the actual adding, and returns the value then. This way of doing things resulted in the following performance characteristics:
M1 called 5,000,000 times: 720ms
M2 called 5,000,000 times: 200ms

As you can see, M2 is now *much* faster than M1, which is what would be expected when using ref parameters. Also, M1 is now a bit slower, due to the extra overhead of calling M2 to do the adding, but not significantly so.

Just as a reference, heres the same code running under Microsofts XNA:
M1 called 5,000,000 times: 670ms
M2 called 5,000,000 times: 200ms

If i really wanted, i could increase the speed of Matrix.Add(Matrix,Matrix) by writing the adding logic in there aswell, but i don't really want to do that. If you want performance, you'll use the Add(ref,ref,out) option.

The before and after code can be seen here

4 comments:

Anonymous said...

I'm not an expert of C#, but i think it could be very usefull to add a Obsolete attribute or something else to M1 to inform the developper that M1 is bad ?

Anonymous said...

If Matrix was a class instead of a struct then both add methods would run in about the same time. Large structs are usually better as classes if you want performance without the effort of using refs.
Is there a particular reason it must be a struct?

Alan said...

@Comment1: It's not bad. In some circumstances you'd be as well off to use M1 as opposed to M2. Supposing that you were *only* adding two matrices, in that case M1 is perfect for your needs. However if you are doing a lot of arithmatic with the matrices doing lots of adding and whatnot, then M2 is what you need.

The cost of using M1 for a single addition is the same as using M2 for a single addition. M1 is less typing though ;)

@Comment2: The main reason is that XNA is a clearly defined framework as designed by microsoft. Microsoft decided to make it a struct, therefore we're implementing it as a struct, otherwise we'd be incompatible with the MS XNA framework, which is a bad thing.

Secondly, by changing to a class you'd be allocating on the heap as opposed to the stack. If Matrix was a class, you'd find that M1 would perform significantly worse than M2 as you would have to keep allocating new objects onto the heap, which is slow.

Alan said...

A quick test of changing it to a class results in these times:
M1 called 5,000,000 times: 500ms
M2 called 5,000,000 times: 560ms

Unfortunately this method also results in 6866 gen 0 garbage collections, and 1 gen 1 collection. These might really hurt performance in production code by promoting other gen0 objects to gen1 when they really shouldn't be going to gen 1.

Hit Counter