Saturday, April 14, 2007

I had a rather interesting experience on the bittorrent mailing list recently. There was a guy who had a server which was severely IO limited and he was trying to figure out how best to improve performance. The problem was that due to the randomness of how a piece is chosen to be downloaded in bittorrent, his server was having to do a lot of random disk seeking thus limiting his maximum upload rate to about 11-15MB/sec even though he had the bandwidth to handle much more than that.

There are hacks to avoid this kind of IO limiting, such as advertising that you have no pieces available When you connect to a peer and then telling them about selective pieces later. The idea is that the piece you advertise having, you'd buffer in memory and so when the peer requests the piece, you don't have to seek to disk to get it for them. With a sufficient memory cache you significantly reduce the amount of random disk access needed and increase throughput hugely

However, this has several disadvantages. The biggest disadvantage of this method is that it can (and does) reduce overall throughput. Take a situation where a torrent has 100 pieces. You put the first 10 pieces into memory and then send messages advertising those 10 pieces to everyone. If everyone already has those pieces, then no uploading will occur until you swap out those pieces and upload the next batch.

Secondly, you *must* disconnect every peer you've connected to every time you decide to swap out those 10 pieces. If you don't, the other peers will remember about the previous pieces you've advertised and may request those even though you no longer have them in memory. This means you once again have random seeking. However, the amount of random seeking would still be significantly reduced as each peer would only know about 20 pieces out of the actual 100 if they weren't disconnected.

Thirdly, this method of seeding requires special logic. Therefore if you want this kind of special handling you must either hack it into an existing open source client yourself, pay someone to do it, or find a client that supports this and use it (regardless of whatever other problems/deficits that client may have).

Now for the fun part: This issue was addressed in the Fast Extensions for the bittorrent protocol with the "SuggestPiece" message.

Suggest Piece is an advisory message meaning "you might like to download this piece." The intended usage is for 'super-seeding' without throughput reduction, to avoid redundant downloads, and so that a seed which is disk I/O bound can upload continguous or identical pieces to avoid excessive disk seeks.

As you can see, the Suggest Piece was designed to fix the exact problems i've described above, unfortunately due to either bad wording, or just bad interpretation, this message is going to fail miserably in that.

The SuggestPiece message is an "Advisory" message. My own opinion is that Advisory was added there to signify that you do *not* have to act on that message. The reason for this is that you could already be downloading piece 3 off peer X when peer Y sends a "SuggestMessage" suggesting you download Piece 3 off him instead. In this case, you would not follow the suggestion, you would just ignore it. That makes sense. It's logical. Downloading the same piece twice would just be stupid ;)

Unfortunately, several bittorrent developers who've implemented the Fast Extensions have mis-interpreted this as "You can completely ignore this message if you want to". One developer cited his reason for completely ignoring this message as "i have no use for this, so i ignore it". This is a stupid reason for not implementing proper handling for the SuggestPiece message. Even "I'm too lazy" would be a better excuse as you'd at least be acknowledging that you have an incomplete implementation.

A "proper" implementation of the SuggestPiece message would be to request that piece off the peer that sent you the suggestion at the earliest convenient time, i.e. the very next piece request off them should be for the suggested piece.

Lets assume that every torrent client implemented the suggest piece message handling as i described above. Now, i'll replay the situation above using SuggestPiece messages as opposed to selective piece advertising.

Firstly, you advertise having all the pieces as being available when you connect to each peer (this requires no special extra logic). Every time you receive a request from a peer to send them a piece, you load the entire piece into memory. As soon as that piece is loaded into memory, you send a SuggestPiece message (this requires no special extra logic) to every other peer. The other peers will then act on that message (this requires no special extra logic) and their next request from you will be that piece. That way every time you load a piece into memory once, you can then send it from memory to every other peer that wants it.

The benefits of this method are that every time you load a piece into memory, you can make other peers request it off you (if they need that piece). You will *never* have a situation where you will not be uploading. You will *always* have better performance than the initial scenario where you were IO limited. Assuming that you're still doing the same number of random disk reads as before, you will be able to increase your upload rate significantly because each piece that enters memory will be sent to more than 1 peer.

I'd guess that you could increase performance by, at the very least, 5 fold in a torrent with 100 other peers using the SuggestPiece method. The other major advantage is that you require no special logic in either the seeding client or downloading client! All you need is a proper implementation of the SuggestPiece message. It's not that hard!

EDIT: Despite promising myself i wouldn't, i think i will name and shame the clients that completely ignore the SuggestPiece message despite claiming support for the Fast Extensions:


The clients that i know of that fully support the extensions are:


JD said...

That looks like a nice extension. I hadn't heard of it. :) Being disk IO bound is very common on large services. Typically this is handled vertically by RAID, and by horizontally scaling the source of the files. For large scenarios where your seeding from well connected data centers it would be interesting to store your torrent pieces in something like Amazon's S3.

Alan said...

True, but if you have sufficient bandwidth you will still be IO limited when using a Raid 0. The point behind the suggest piece message is to increase the efficieny of the client by allowing it to easily send a single piece to multiple peers by "suggesting" it to them. This by itself will increase the upload rate in IO limited servers considerably!

Hit Counter