Monday, September 29, 2008

tee-sharp a.k.a. CopyStream

How many times do you have one stream, but you actually want to write the data to multiple places simultaneously[0]? Well, now you can[1]!

I took an hour and spun up this awesome asynchronous beast of a stream splitter. There is an optimisation that could be applied to it: Reads can be performed at the same time as writes. I figured that for a 1.0 implementation, this was good enough. If anyone wants to try their hand at making the read perform in parallel with the writes, feel free. Patches are welcome ;)


EDIT: There's also a 'deliberate' bug there. 10 points to the first person to spot it and bonus 10 points if you can fix it with less than 5 lines of extra code.


Amber said...

hmmm it could be benficial to pass something other than null for state in line 106?

No time to look any further right now
(on an AIX job here)

Alan said...

It'd be a waste of time. That is a synchronous function, so it won't return until all the copying has been completed.

When calling async methods, you typically use the 'state' variable to store an object so you can figure out which of your async calls has completed. This is irrelevant with sync methods. This is also why 'callback' is null aswell.

Jonathan Pryor said...

1. TeeStream isn't a great name, as it's not actually a Stream subclass.

2. Would you be at all interested in adding this to Mono.Rocks? I think it could be munged into a set of extension methods, e.g.

static class StreamRocks {
public static void WriteTo (this Stream self, params Stream[] destinations);
public static void WriteTo (this Stream self, IEnumerable<Stream> destinations);

Though I'm at a loss for what the API for the async version should look like...

Amber said...

(a) frankly I don't fully grok all the async going on
(b) I don't have the time to work with the code to see how it works

So from my visual inspection only, wouldn't BeginCopy finally end up calling EndWrite somehow? In 174/176 this would result in an error since that asyncresult seems to be set to null?

I'm having the sneaking suspicion that multiple *different* async-result instances are being used.

I'd venture that if my understanding of this code is so far off that I'm askinga silly question here, perhaps this means that better variable naming is in order to distinguish *what* result is being passed/used for *what* purpose.

I must admit, there is much beauty in this aync beast, but (as I have learned in my Perl days) beauty should never come with obfuscation :)

Alan said...

Well, the bug in the code is that it's possible for the copyResult to be 'completed' more than once, meaning the callback could be invoked more than once, which is bad. When you work with Async code all the time, you notice these things pretty quickly ;)

The thing to note in the code is that the 'copyResult' is like the *global* IAsyncResult. This will only get signalled when the entire operation has been completed.

The idea is:
1) Read a chunk of data asynchronously
2) When that read is finished, begin writing to all destinations at the same time. Store the 'IAsyncResults' from these writes in a list.
3) As each write finishes, remove its IAsyncResult from the list. When the list is empty, all the writes have finished.
4) Go to step 1 unless all data has been read. Once all data is read, complete the 'copyResult'.

Andy Hume said...

Have you seen and ".NET Matters: Stream Pipeline" and ".NET Matters
Asynchronous Stream Processing" respectively. It's a while since I read those articles but I think they're in the same ballpark.


