MonoTorrent: November 2009

This is the logo for the London olympics. Wolff Olins were paid a whopping GBP 400,000 to design this. You'd think that if you spent that kind of money, you'd end up with a logo that can be understood by mere mortals, however that's not what we got.

When I look at that logo I see the numbers '2' and '0' at the top. That's relatively clear. However, the bottom section is just random gibberish. If I concentrate a little harder, I could convince myself that there's a '1' in the bottom left, but what the hell is at the bottom right? A square with a squiggle beside it? I can't for the life of me figure out what that little square is for. Is it part of the '2' or is it there because they wanted to write "20-12"? I really can't tell.

If you're going to design a logo for such an important worldwide event, why can't it be legible? Logo design should be about creating an inventive and visually appealing way of getting some information across. It isn't supposed to be about who can create the most obfusticated advertisement - that defeats the entire purpose! Out of this list of a dozen alternative designs my favourite is this one:

It may not be the best logo I've ever seen, but it does one thing right: It's instantly recognisable and understandable. I know exactly what it's about without having to spend 10 mins rotating the thing trying to make sense out of it.

Update: As pointed out in the comments, there are errors in the implementation of SeekableStream below. That's what happens when you write a class at 2am ;) The errors are that Writing to the stream will be off by 1 byte if you've peeked, and Position will be off by 1 byte if you've peeked. I'll re-read and update the class later.

Update 2: I updated the implementation of PeekableStream to make it a read-only stream wrapper. I'm not pushed about supporting writing on it as it's not something I require. I'll leave it as an exercise to the reader to support writing if they need it ;)

How many times has that urge hit you to peek inside a stream and see the secrets hidden inside? Have there been times when you wished that Stream exposed a 'public byte Peek ()" method so you could do this easily? Were you delighted when you discovered that BinaryReader exposed PeekChar () , which did nearly the same thing [0]? If that describes you, you're in for a horrible horrible surprise.

I received a bug report during the week saying that MonoTorrent reads all data twice from disk when it is hashing. I responded with "No it doesn't! That'd be crazy!", to which I was handed a screenshot of a ~310MB torrent which in the windows task manager reported ~750MB of I/O Read Bytes. I was told this happened after loading the torrent and calling HashCheck () on it. This was irrefutable evidence that something was up, but I was still unconvinced .

So over the weekend I fired up windows and double checked. I could replicate the bizarre statistic. But strangely enough, it wasn't hashing that was causing it! The I/O Read Bytes was up at around 350MB before hashing even started. But that was strange, because the only thing that happened before that was:

Torrent torrent = Torrent.Load (path);

There's no way a simple forward-only parser reading a 100kB file could possibly result in 350MB of IO reads, could it? Actually, it could! Changing the declaration to the following completely vanished the extra 350MB:

Torrent torrent = Torrent.Load (File.ReadAllBytes (path))

So what was going wrong? Internally the parser used BinaryReader.PeekChar () to figure out the type of the next element so that correct decoder could be called. I thought this would be a simple array access, or something similar. However what actually happens is that one byte is read from the underlying stream, then the stream seeks 1 byte backwards . In the case of FileStream, this meant that the entire read buffer was refilled from 'disk' [1] every time I peeked. A 100kB file really was really being turned into a 350MB monstrosity! And yes, the Mono implementation unfortunately has to do the same. So how could I fix this?

Simples! I could write a PeekableStream, one that's smart enough to not need to do horrible buffer killing seeks. What was the end result? Well, that particular .torrent file loaded nearly 5x faster, ~100ms for everything instead of ~500ms. An average file would experience a much smaller speedup. This one is a bit different in that it contains over 2000 files and the speed up is proportional to the number of BEncoded elements in the .torrent file.

public class PeekableStream : Stream
{
    bool hasPeek;
    Stream input;
    byte[] peeked;

    public PeekableStream (Stream input)
    {
        this.input = input;
        this.peeked = new byte[1];
    }

    public override bool CanRead
    {
        get { return input.CanRead; }
    }

    public override bool CanSeek
    {
        get { return input.CanSeek; }
    }

    public override bool CanWrite
    {
        get { return false; }
    }

    public override void Flush()
    {
        throw new NotSupportedException();
    }

    public override long Length
    {
        get { return input.Length; }
    }

    public int PeekByte()
    {
        if (!hasPeek)
            hasPeek = Read(peeked, 0, 1) == 1;
        return hasPeek ? peeked[0] : -1;
    }

    public override int ReadByte()
    {
        if (hasPeek)
        {
            hasPeek = false;
            return peeked[0];
        }
        return base.ReadByte();
    }

    public override long Position
    {
        get
        {
            if (hasPeek)
                return input.Position - 1;
            return input.Position;
        }
        set
        {
            if (value != Position)
            {
                hasPeek = false;
                input.Position = value;
            }
        }
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        int read = 0;
        if (hasPeek && count > 0)
        {
            hasPeek = false;
            buffer[offset] = peeked[0];
            offset++;
            count--;
            read++;
        }
        read += input.Read(buffer, offset, count);
        return read;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        long val;
        if (hasPeek && origin == SeekOrigin.Current)
            val = input.Seek(offset - 1, origin);
        else
            val = input.Seek(offset, origin);
        hasPeek = false;
        return val;
    }

    public override void SetLength(long value)
    {
        throw new NotSupportedException();
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        throw new NotSupportedException();
    }
}

This code is under the MIT/X11 license, so everywhere you use PeekChar () and actually just want to Peek at a byte, use this class instead. Your harddrives will love you for it. If you actually want to peek at a char, extend this class to be able to read a (multi-byte) char from the underlying stream and cache it locally just like the current PeekByte method . A side benefit is that you can now peek at unseekable streams. Not too bad, eh?

[0] PeekChar does exactly what it says on the tin. It reads one (multi-byte) character from the stream. So if you're using PeerChar on a binary stream which does not contain valid data as defined by the current Encoding, you're going to corrupt some data or get exceptions. I mention this here in case anyone is using PeekChar () as a way of reading bytes from the stream.

[1] I say that the FileStream buffer was being filled from disk, but that's not quite accurate. It was actually being refilled from either the windows cache or the harddrives cache. It's physically impossible for a mere 7200 RPM harddrive to supply data that fast. However I still was physically copying 350MB of data around in memory so that was a huge penalty right there.

MonoTorrent

Monday, November 30, 2009

What does this say?

Thursday, November 19, 2009

Dear Thierry Henry

Monday, November 02, 2009

Don't BinaryReader.PeekChar () at me!

Hit Counter

Blog Archive

About Me