Saturday, January 26, 2008

Today i'm going to talk about optimisation a little, this is mostly in the MonoTorrent Tracker context, but some of the ideas still apply to other situations.

Sometimes you hear people talking about how they want to optimise their code to make it faster, and they ask questions like 'what's the fastest way to multiply by 2, bitshift or regular multiplication?', or 'Should I null out objects the very second I'm finished using them or just wait for them to fall out of scope?'. These kind of questions are probably the epitome of premature optimisation. These kind of optimisations won't make your application faster or better.

I had done a lot of work on the Tracker code (the 'server' portion of the bittorrent specification) recently. I had precomputed values which are used regularly, i had optimised the hashcodes used in my dictionary lookups, i had reduced the amount of data that needs to be kept in-memory significantly. I wanted to see what effect all this had on the actual running of the tracker. This was the first time i had run a benchmark on the tracker.

I went on the net, found a big tracker and checked it's stats. Using these, i decided that this was benchmark was representative of a heavy real-world load:

1) Load 2000 torrents into the engine, each of which contains 1000 peers.
2) Hammer the server with 1000 requests a second choosing a random torrent and random peer from the list and make a fake request from them to the server.

So, once that was written, i fired up the tracker and ran the benchmark. My system locked up and i was forced to hard-reboot. What had gone wrong! I started the benchmark again, but monitored memory and CPU usage carefully. I was surprised to find that memory usage was rocketing, which is what caused the massive slowdown of my system! I couldn't understand why. I did a few quick calculations to figure out how much memory i'd expect the tracker to use, they put final memory usage at far less than 300MB. I quickly whipped out my allocation profiler and began optimising. Here are the 'issues' i fixed:

1) The objects i was using as the 'key' in a dictionary were being recreated every time i used them. Typically this means that for every request to the tracker, at least a dozen complex objects were created/garbage collected needlessly. In this kind of scenario, the objects should be declared as 'static readonly' and reused. I implemented this change.

The benchmark still couldn't run.

2) I decided that the next problem was that i was pre-generating two byte[] for each peer when they were added to the server. This was so that a request could be fulfilled by simply copying pregenerated byte[]. I changed this to generate the byte[] at request time rather than storing it in memory. I expected this to fix the issue.

The benchmark still couldn't run after this change.

3) Finally, i noticed there were a huge number of hashtable related objects being retained in memory. This was a bit weird. There shouldn't have been that many around. A few minutes of checking the code made me realise that the probably cause was keeping a NameValueCollection object in-memory for each peer. I rewrote the peer class to extract the necessary information from the collection and then dump it, rather than holding a reference to it.

The benchmark could run!

Now, the memory improvements were gigantic. Previously i had stats like this:
Active Torrents: 500
Active Peers per Torrent: 500
Memory: 350MB

Now i had:
Active Torrents: 500
Active Peers per Torrent: 500
Memory: 40MB

Active Torrents: 2000
Active Peers per Torrent: 1000
Memory: 140MB

There is no way in hell I'd ever have found the cause of the issue unless i had run a profiler. So, for anyone who is trying to make their code run fast and efficiently, you need a profiler. You can't get away without it.


JD said...

Glad to see you've entered the profiler fanboy camp. :) I did the same thing recently with a LINQ to SQL optimization in a Facebook app of mine: Linq to SQL Surprise Performance Hit.

viagra online said...

Great Blog I appreciate the blog for this very informative information..... Online Pharmacy said...

Its really very interesting blog post. Keep posting such an amazing blog post. Keep it up.

Hit Counter