Saturday, February 03, 2007

Asynchronous Process - Can't live with it, can't live without it

There are often times where you have two tasks running simultaenously at different speeds which depend on each other. For example, imagine transferring a file from computer X to computer Y. The simplistic approach is for computer X to transfer a part of the file to computer Y, wait for computer Y to write the data to the disk, then transfer the next part. This is a "pipelined" approach. It's good enough for most purposes, but if you have a large communication delay between the two computers, performance will suffer. However this method is very stable as you know exactly how much memory is needed and how much CPU time is needed to transfer and write a single block of data.

A second approach is for computer X to pass a part of the file to computer Y and (without waiting for Y to finish writing the data to disk) begin transferring the next part. This second approach can result in a *much* faster transfer rate when there is a large communication delay between the two computers simply because computer X no longer has to wait for the "I have written the data to disk" reply from computer Y.

However, the problem with the second approach is that if computer X sends 100 blocks of data a second and computer Y can only write 50 blocks of data a second to the disk, we either end up losing data by just dumping it or we end up storing it all in memory until we can write it to disk and memory usage climbs dramatically!

So, in order to gain the performance of the second method with the stability of the first method, you need some way to send feedback to computer X that it is sending data to fast. This is the problem i've just encountered in MonoTorrent when i converted the disk writes to be fully asynchronous to avoid blocking the main downloading thread. When i seed off my local machine, i can't write fast enough so i end up storing 1000's of chunks of data in memory until i tell my seeding client to stop. How easy this will be to fix, i'm not sure. But i am sure it's doable ;)

3 comments:

Anonymous said...

Maybe an algorithm similar to tcp "slow start" can be used, sending two new blocks from X for each block ack you receive from Y.

Anonymous said...

Can't you just use a bounded queue between your workers ?

Thierry Mallard said...

This scenario seems very much the case where one can use the "continuations" techniques.

On Mono, projects like Monoco[1] may be interesting.

On a related subject, programming languages like Erlang[2] may be interesting to look at, as this language is very concurrency oriented.

[1] http://www.bat.org/~tomba/monoco.html

[2] http://www.erlang.org

Best regards

-- Thierry Mallard

Hit Counter