Sunday, December 20, 2009

Expression Trees - serializing your data

Update: Just to clarify - the code snippets below are under the MIT/X11 license.

I spent a few hours over the weekend writing a binary serializer using expression trees. I wanted to see how things would look using the new features available in .NET 4.0. My requirements were pretty simple:

1) Serialize all public properties in a type or a subset of them
2) Control the order in which they're serialized - sometimes you need to interop with an existing and you must write your data in a specific order
3) Control how a primitive is converted - Do you need to write value types in big endian, little endian, middle endian?
4) Easy to use API.

So lets start with the API. This is what I was hoping to use:

public class Secondary
{
public int First { get; set; }
public int Second { get; set; }
public int Third { get { return First + Second; } }
}

public class MyClass
{
public byte ByteProp { get; set; }
public short ShortProp { get; set; }
public int IntProp { get; set; }
public long LongProp { get; set; }
public string StringProp { get; set; }
}

static void Main(string[] args)
{
// Register a message so that all public fields will be serialized
Message.Register<MyClass>();

// Register a message so that only some fields are serialized and
// they are serialized in the specified order
Message.Register<Secondary>(
d => d.Second,
d => d.First
);

// Create a stream to serialize the data to
Stream s = new MemoryStream();
var message = new MyClass {
IntProp = 1,
LongProp= 2,
ByteProp= 3,
ShortProp = 4,
StringProp = "Hello World"
};

// Encode the message to the stream
MessageEncoder.Encode(message, s);

// Rewind the stream and then decode the message
s.Position = 0;
var decoded = MessageDecoder.Decode<MyClass>(s);
}

It's pretty standard stuff. You can work with the standard serializer logic (serialize properties alphabetically) by registering an object without specifying any specific properties or you can customise which properties are serialized. This could also be done using attributes, but using attributes to control the order in which properties are serialized would be more error prone than the above.

Firstly, sometimes you need to write your data in big endian, others you need little endian. Sometimes you won't care. What you need is to be able to control this:
MessageEncoder.RegisterPrimitiveEncoder<int>((value, stream) => {
stream.Write(BitConverter.GetBytes(value));
});

It's simple. Any type which can be directly converted to an array of bytes is classified as a 'primitive'. Each primitive can have an encoder/decoder pair registered as above.

public static class MessageEncoder
{
static Dictionary<Type, Delegate> encoders;
static Dictionary<Type, Delegate> primitives;

static MessageEncoder()
{
encoders = new Dictionary<Type, Delegate>();
primitives = new Dictionary<Type, Delegate>();
RegisterPrimitiveEncoders();
}

static void RegisterPrimitiveEncoders()
{
RegisterPrimitiveEncoder<byte>((value, stream) =>
stream.WriteByte(value)
);

RegisterPrimitiveEncoder<short>((value, stream) =>
stream.Write(BitConverter.GetBytes(IPAddress.HostToNetworkOrder(value)))
);

RegisterPrimitiveEncoder<int>((value, stream) =>
stream.Write(BitConverter.GetBytes(IPAddress.HostToNetworkOrder(value)))
);

RegisterPrimitiveEncoder<long>((value, stream) =>
stream.Write(BitConverter.GetBytes(IPAddress.HostToNetworkOrder(value)))
);

var intWriter = (Action<int, Stream>)primitives[typeof (int)];
RegisterPrimitiveEncoder<string>((value, stream) => {
var buffer = Encoding.UTF8.GetBytes(value);
intWriter(buffer.Length, stream);
stream.Write(buffer);
});
}

public static void RegisterPrimitiveEncoder<T>(Action<T, Stream> encoder)
{
primitives [typeof (T)] = encoder;
}

public static void RegisterMessage<T>(params Expression<Func<T, object>>[] properties)
{
RegisterMessage<T>(properties.Select(p => p.AsPropertyInfo ()));
}

public static void RegisterMessage<T>(IEnumerable<PropertyInfo> properties)
{
var propertyEncoders = new List<Expression>();

// The encode function takes an instance of the class we're decoding and the Stream
// which we should write the data to.
ParameterExpression source = Expression.Parameter(typeof(T), "source_param");
ParameterExpression stream = Expression.Parameter(typeof(Stream), "stream");

// For each property, get the encoder which will convert the value of the property to a byte[]
// which can be written to the stream.
foreach (var property in properties) {
// Get the encoder for this property type
var action = primitives[property.PropertyType];
// Create a var which holds the Action <T, Stream> which encodes the data to the stream
Expression converter = Expression.Constant(action, action.GetType ());
// Invoke the encoder passing the value of the property and the 'stream'
Expression invoker = Expression.Invoke(converter, Expression.Property(source, property), stream);
// Add the encoder for this property to the list.
propertyEncoders.Add(invoker);
}

// Create an expression block which will execute each of the encoders one by one
Expression block = Expression.Block(propertyEncoders);
encoders.Add(typeof(T), Expression.Lambda<Action<T, Stream>>(
block,
source,
stream
).Compile());
}

public static void Encode<T>(T message, Stream s)
{
var encoder = (Action<T, Stream>)encoders[typeof (T)];
encoder (message, s);
}
}

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Linq.Expressions;
using System.Reflection;
using System.Net;
using System.IO;

namespace Encoder
{
public static class MessageDecoder
{
static Dictionary<Type, Delegate> decoders;
static Dictionary<Type, Delegate> primitives;

static MessageDecoder()
{
decoders = new Dictionary<Type, Delegate>();
primitives = new Dictionary<Type, Delegate>();
RegisterDefaultDecoders();
}

static void RegisterDefaultDecoders()
{
RegisterPrimitiveDecoder<byte>((s) => {
var val = s.ReadByte();
if (val == -1)
throw new EndOfStreamException();
return (byte)val;
});

RegisterPrimitiveDecoder<short>((s) => IPAddress.NetworkToHostOrder (s.ReadShort()));
RegisterPrimitiveDecoder<int>(s => IPAddress.NetworkToHostOrder (s.ReadInt()));
RegisterPrimitiveDecoder<long>(s => IPAddress.NetworkToHostOrder (s.ReadLong()));

var intDecoder = (Func<Stream, int>)primitives[typeof(int)];
RegisterPrimitiveDecoder<string>(s => {
var length = intDecoder(s);
var buffer = new byte[length];
s.Read(buffer, 0, buffer.Length);
return Encoding.UTF8.GetString(buffer);
});
}

public static void RegisterPrimitiveDecoder<T>(Func<Stream, T> decoder)
{
primitives.Add(typeof(T), decoder);
}

public static void RegisterMessage<T>(params Expression<Func<T, object>>[] properties)
{
RegisterMessage<T>(properties.Select(d => d.AsPropertyInfo()));
}

public static void RegisterMessage<T>(IEnumerable<PropertyInfo> properties)
{
var propertyDecoders = new List<Expression>();

// The decode function takes an instance of the class we're decoding and the Stream
// containing the data to decode.
ParameterExpression source = Expression.Parameter(typeof(T), "source_param");
ParameterExpression stream = Expression.Parameter(typeof(Stream), "stream");

// For each property, get the primitive decoder which will read data from the stream and
// return a value of the correct type.
foreach (var property in properties) {
var action = primitives[property.PropertyType];
// Create a var which holds the Func <Stream, T> which decodes the data from the stream
Expression decoder = Expression.Constant(action, action.GetType());
// Invoke the decoder passing 'stream' as the parameter
Expression invoker = Expression.Invoke(decoder, stream);
// Store the return value of the decoder in the property.
Expression setter = Expression.Call(source, property.GetSetMethod(), invoker);
// Add the decoder for this property to the list.
propertyDecoders.Add(setter);
}

// Create a block which will execute the decoders for all the fields one after another.
Expression block = Expression.Block(propertyDecoders);
decoders.Add (typeof (T), Expression.Lambda<Action<T, Stream>>(
block,
source,
stream
).Compile ());
}

public static T Decode<T>(Stream s) where T : class, new()
{
T t = new T();
var decoder = (Action<T, Stream>)decoders[typeof(T)];
decoder(t, s);
return t;
}
}
}


The idea is quite simple. For each class we can generate an ideal serializer using expression trees which doesn't require boxing or casting. This way we can avoid the use of reflection when serializing objects and so avoid the performance penalties incurred that. The code above only handles the simple case where a class consists of primitive types (int, long, string) , though it'd be easy enough to extend it to support more complex scenarios.

The serializer as you see it could not have been written with .NET 3.0. Some of the key components like BlockExpression were only introduced with .NET 4.0. If your object contains an array which needs to be serialized, you'll need the new IndexExpression too. Sure, it's possible to fake these using some anonymous delegates and Actions, but that's not pretty :)

The total implementation is less than 170 LOC. I'd be willing to bet that with another 100 LOC you could support most constructs. If you're currently a heavy user of reflection to provide object serialization, it's time to update ;)

Tuesday, December 15, 2009

New years resolutions

It's tradition in quite a lot of countries to make a new years resolution on the 1st of January. Most people forget about them within a few days or weeks. This year, I'll be making one I'm going to keep!

I want to take part in a dancing [0] flash mob whether it's in this country or another.



What ideas do you have? Anything strange, interesting, unusual? Leave a comment and let me know, maybe you have a better idea than being part of a flash mob.

[0] Me and dancing don't get on particularly well, so it'll be an interesting challenge ;)

Sunday, December 06, 2009

Yet another INotifyPropertyChanged with Expression Trees - Part 2

In my last post, I described a method whereby you can implement INotifyPropertyChanged with zero performance overhead and near-zero boilerplate code. The only boilerplate left was the delegate you had to create to invoke the event:

public Book()
{
// Boilerplate - eugh!
Action<string> notify = (propertyName) => {
var h = PropertyChanged;
if (h != null)
h(this, new PropertyChangedEventArgs(propertyName));
};

author = new ChangeNotifier<string> (() => Author, notify);
price = new ChangeNotifier<decimal> (() => Price, notify);
quantity = new ChangeNotifier<int> (() => Quantity, notify);
title = new ChangeNotifier<string> (() => Title, notify);
}

The entire point of my implementation was to avoid writing boilerplate, so this was slightly irritating. Unfortunately, there's no trivial way around the problem as the .NET framework really limits what you can do with events. The first thing you'd think of is "pass the actual object into the ChangeNotifier constructor and just raise the event that way". For example my constructors would change to:

new ChangeNotifier<string>(() => Author, this);

That's well and good, right up until you realise that it's impossible for one object to raise an event that's declared on another object.

public class A
{
public event EventHandler MyEvent;
}

public class B
{
public void AccessEvent (A a)
{
// Invalid - you can't raise an event which is declared in another class
a.MyEvent(this, EventArgs.Empty);

// Invalid - you can't copy the event either
EventHandler h = a.MyEvent;
h(this, EventArgs.Empty);
}
}
Another alternative would be to pass the event itself into the ChangeNotifier object:
new ChangeNotifier<string> (() => Author, PropertyChanged);
But this won't work because a copy of the delegate list is created. That means if anyone adds event handlers later on, they won't be invoked when the property changes. So with that stuck firmly in my mind, I never gave much thought to removing that last remaining bit of boilerplate. That's about to change!

What I really want is for my final implementation to look more like this:

public class Book : INotifyPropertyChanged
{
public event PropertyChangedEventHandler PropertyChanged;

ChangeNotifier<string> author;

public string Author
{
get { return author.Value; }
set { author.Value = value; }
}

public Book()
{
author = ChangeNotifier.Create(() => Author, ????);
}
}

That's short and sweet . The generic types should be automatically inferred, you shouldn't have to create the delegate to raise the event, it's beautiful! The only problem is to figure out what I should replace the question marks with. I need something that will allow me to get at the current list of event handlers from outside of the Book object, i.e. something along the lines of this:

Func<PropertyChangedEventHandler> getter = delegate { return PropertyChanged; };

Prettying it up a little, this is how my Book class looks:

public class Book : INotifyPropertyChanged
{
public event PropertyChangedEventHandler PropertyChanged;

ChangeNotifier<string> author;

public string Author {
get { return author.Value; }
set { author.Value = value; }
}

public Book()
{
author = ChangeNotifier.Create (() => Author, () => PropertyChanged);
}
}

Beautiful! The more astute readers might notice a problem at this stage. Fine, the ChangeNotifier object can get the event list and raise the event, but it can't fill in the 'sender' - it has no reference to the 'book' object! Have no fear, it's already taken care of! The getter delegate has a reference to the book object (Delegate.Target), so we can fill everything in perfectly! The final implementation of the ChangeNotifier class is this:
public static class ChangeNotifier
{
public static ChangeNotifier<TValue> Create<TValue>(Expression<Func<TValue>> expression, Func<PropertyChangedEventHandler> notifier)
{
return new ChangeNotifier<TValue>(expression, notifier);
}
}

public class ChangeNotifier<TValue>
{
Func<PropertyChangedEventHandler> notifier;
string propertyName;
TValue value;

public TValue Value {
get { return value; }
set {
if (!EqualityComparer<TValue>.Default.Equals(this.value, value)) {
this.value = value;
// Get the current list of registered event handlers
// then invoke them with the correct 'sender' and event args
PropertyChangedEventHandler h = notifier();
if (h != null)
h(notifier.Target, new PropertyChangedEventArgs(propertyName));
}
}
}

public ChangeNotifier(Expression<Func<TValue>> expression, Func<PropertyChangedEventHandler> notifier)
{
if (expression.NodeType != ExpressionType.Lambda)
throw new ArgumentException("Value must be a lamda expression", "expression");
if (!(expression.Body is MemberExpression))
throw new ArgumentException("The body of the expression must be a memberref", "expression");

MemberExpression m = (MemberExpression)expression.Body;
this.notifier = notifier;
this.propertyName = m.Member.Name;
}
}
I have one final trick up my sleeve. Suppose you have a field (Progress) whose value is calculated based on other values (CurrentStep, TotalSteps) and you want to get Notifications whenever any of those fields changes, well, that's easy!

public class Worker : INotifyPropertyChanged
{
public event PropertyChangedEventHandler PropertyChanged;

ChangeNotifier<int> currentStep;
ChangeNotifier<int> totalSteps;

public int CurrentStep {
get { return currentStep.Value; }
set { currentStep.Value = value; }
}
public int TotalSteps {
get { return totalSteps.Value; }
set { totalSteps.Value = value; }
}
public double Progress
{
get { return (double)CurrentStep / TotalSteps; }
}

public Worker()
{
Func<PropertyChangedEventHandler> notifier = () => PropertyChanged;

currentStep = ChangeNotifier.Create(() => CurrentStep, notifier);
totalSteps = ChangeNotifier.Create(() => TotalSteps, notifier);

// A PropertyChanged notification will be created for Progress every time
// either the CurrentStep *or* TotalSteps changes.
ChangeNotifier.CreateDependent(
() => Progress,
notifier,
() => CurrentStep,
() => TotalSteps
);
}
}

And the new helper methods are:
public static class ChangeNotifier
{
static string GetPropertyName(Expression expression)
{
while (!(expression is MemberExpression)) {
if (expression is LambdaExpression)
expression = ((LambdaExpression)expression).Body;
else if (expression is UnaryExpression)
expression = ((UnaryExpression)expression).Operand;
}

return ((MemberExpression)expression).Member.Name;
}

public static void CreateDependent<TValue>(Expression<Func<TValue>> property, Func<PropertyChangedEventHandler> notifier, params Expression<Func<object>>[] dependents)
{
// The name of the property which is dependent on the value of other properties
var name = GetPropertyName(property);
// The names of the other properties
var dependentNames = dependents.Select<Expression, string>(GetPropertyName).ToArray();

INotifyPropertyChanged sender = (INotifyPropertyChanged)notifier.Target;
sender.PropertyChanged += (o, e) => {
// If one of our dependents changes, emit a PropertyChanged notification for our property
if (dependentNames.Contains(e.PropertyName)) {
var h = notifier();
if (h != null)
h(o, new PropertyChangedEventArgs (name));
}
};
}

public static ChangeNotifier<TValue> Create<TValue>(Expression<Func<TValue>> expression, Func<PropertyChangedEventHandler> notifier)
{
return new ChangeNotifier<TValue>(expression, notifier);
}
}

The only change is that I need to use a slightly more complicated method of getting the property name as it's possible for certain types to get wrapped in a ConvertExpression.

Saturday, December 05, 2009

Yet another INotifyPropertyChanged with Expression Trees

There are dozens of examples out there showing you how to avoid having to refer to method names as strings when implementing INotifyPropertyChanged. The most important reason why you don't want to have to do this is because method names can get refactored but the hardcoded strings might be forgotten. No-one wants to end up getting a Changed notification for a property which doesn't exist.

My issue with all these examples is that none of them thought far enough ahead. Fine, they all show you how refer to properties without using hardcoded strings but they still require you to write lots of boilerplate code to raise the PropertyChanged event - boilerplate you have to write for every property. What I want is to be able to declare all my properties like:

public string Title {
get { return title; }
set { title = value; }
}

and yet still get my property change notifications. I also want this method to be reasonably high performance. I don't want every property change to have extra memory or CPU overhead as every developer expects that changing the value of a property will not do any complex calculations. So how can I accomplish this?

To start off with, we can all tell that it's impossible to achieve the required behaviour using just the snippet above. We're going to have to add (at least) one additional level of indirection. That means I should be able to implement my requirements using code like:

public string Title {
get { return title.Value; }
set { title.Value = value; }
}

The object 'title' must then contain all the logic required to raise the property changed notification. So what might this magical object look like?

public class ChangeNotifier<TValue>
{
Action<string> notifyHandler;
string propertyName;
TValue value;

public TValue Value {
get { return value; }
set {
if (!EqualityComparer<TValue>.Default.Equals(this.value, value)) {
this.value = value;
notifyHandler(propertyName);
}
}
}


public ChangeNotifier(Expression<Func<TValue>> expression, Action<string> notifyHandler)
{
if (expression.NodeType != ExpressionType.Lambda)
throw new ArgumentException("Value must be a lamda expression", "expression");
if (!(expression.Body is MemberExpression))
throw new ArgumentException("The body of the expression must be a memberref", "expression");

MemberExpression m = (MemberExpression)expression.Body;
this.propertyName = m.Member.Name;
this.notifyHandler = notifyHandler;
}
}

You're probably looking at this thinking "What the hell is this Expression<Func<TValue>> ? How do I even use that monstrosity?". Well... simples!

public class Book : INotifyPropertyChanged
{
public event PropertyChangedEventHandler PropertyChanged;

ChangeNotifier<string> author;
ChangeNotifier<decimal> price;
ChangeNotifier<int> quantity;
ChangeNotifier<string> title;

public string Author {
get { return author.Value; }
set { author.Value = value; }
}
public decimal Price {
get { return price.Value; }
set { price.Value = value; }
}
public int Quantity {
get { return quantity.Value; }
set { quantity.Value = value; }
}
public string Title {
get { return title.Value; }
set { title.Value = value; }
}

public Book()
{
Action<string> notify = (propertyName) => {
var h = PropertyChanged;
if (h != null)
h(this, new PropertyChangedEventArgs(propertyName));
};

author = new ChangeNotifier<string> (() => Author, notify);
price = new ChangeNotifier<decimal> (() => Price, notify);
quantity = new ChangeNotifier<int> (() => Quantity, notify);
title = new ChangeNotifier<string> (() => Title, notify);
}
}

All that happens here is that when constructing the ChangeNotifier object, an Expression referencing the required Property is passed into the constructor, along with a delegate which will raise the PropertyChanged event. We parse that expression tree to retrieve the method name and store it. After that everything Just Works (tm) with little to no performance penalty. The days of writing boilerplate code for INotifyPropertyChanged are gone! You also have the benefit that you can't make a mistake writing the boilerplate code because you don't write it anymore!

Friday, December 04, 2009

Can't you feel the Moonlight? Part deux

As I was saying yesterday, the live version of the silverlight toolkit site didn't work right in moonlight. All the pretty charts rendered as you see them below, very empty.

I figured that since a slightly older version worked near-flawlessly, surely I could fix the live version with only a few minor tweaks. It's not like the would've completely rewritten the Chart controls within the space of 1 release.

I checked everything from DataBinding, to TemplateBinding, to Styles, to Measure/Arrange bugs and nothing was showing up as causing the issue. I finally narrowed it down to a bug in VisualStateGroup. For some reason the Name property was empty even though it was declared with a name in xaml.

One. Tiny. Patch. Later.


Success. I can't believe that the bug was that simple. In the end, those bugs are actually by far the worst. There's no exception thrown or any kind of visible indication that something has failed other than an empty screen. The only reason I found the bug was because the toolkit is opensource and I was running it locally with a few dozen Console.WriteLines, gradually reducing the area of code where I thought the bug was. Unfortunately this fix arrived too late for the 1.99.9 release, but it will definitely be in the release after it.

Wednesday, December 02, 2009

Can't you feel the moonlight?

It's time for the obligatory screenshots again. This is what the Data Visualisation demos from the Silverlight Toolkit (March edition) looked like yesterday:

Note the empty graphs. It doesn't look very pretty now, does it? However, one very minor fix later we now have the following:

Things are near-perfect in all the Data Visualization demos. One graph is missing a background colour and the elements in one graph aren't clickable when they should be. Neither should be particularly difficult to fix, the only problem is figuring out the cause.

Unfortunately the version of the Toolkit Demo on the live site still doesn't render perfectly, but as we already have one version near-perfect, getting a newer revision to work shouldn't be hard! Things are shaping up to give us a great 2.0 release.

Monday, November 30, 2009

What does this say?


This is the logo for the London olympics. Wolff Olins were paid a whopping GBP 400,000 to design this. You'd think that if you spent that kind of money, you'd end up with a logo that can be understood by mere mortals, however that's not what we got.

When I look at that logo I see the numbers '2' and '0' at the top. That's relatively clear. However, the bottom section is just random gibberish. If I concentrate a little harder, I could convince myself that there's a '1' in the bottom left, but what the hell is at the bottom right? A square with a squiggle beside it? I can't for the life of me figure out what that little square is for. Is it part of the '2' or is it there because they wanted to write "20-12"? I really can't tell.

If you're going to design a logo for such an important worldwide event, why can't it be legible? Logo design should be about creating an inventive and visually appealing way of getting some information across. It isn't supposed to be about who can create the most obfusticated advertisement - that defeats the entire purpose! Out of this list of a dozen alternative designs my favourite is this one:

It may not be the best logo I've ever seen, but it does one thing right: It's instantly recognisable and understandable. I know exactly what it's about without having to spend 10 mins rotating the thing trying to make sense out of it.

Thursday, November 19, 2009

Dear Thierry Henry



If you truly were sorry that you twice hit the ball with your hand to prevent it from going wide and then proceeded to score from that opportunity, why didn't you admit it on the spot. You knew what you did, you did it deliberately, it is useless claiming you're sorry now when it's too late for that to mean anything. Your chance to show your true colours was on the field and show them you did.

Since the match will not be replayed, why don't you step down from this world cup season if you truly do regret that decision which cost Ireland our world cup chance?

Monday, November 02, 2009

Don't BinaryReader.PeekChar () at me!

Update: As pointed out in the comments, there are errors in the implementation of SeekableStream below. That's what happens when you write a class at 2am ;) The errors are that Writing to the stream will be off by 1 byte if you've peeked, and Position will be off by 1 byte if you've peeked. I'll re-read and update the class later.

Update 2: I updated the implementation of PeekableStream to make it a read-only stream wrapper. I'm not pushed about supporting writing on it as it's not something I require. I'll leave it as an exercise to the reader to support writing if they need it ;)

How many times has that urge hit you to peek inside a stream and see the secrets hidden inside? Have there been times when you wished that Stream exposed a 'public byte Peek ()" method so you could do this easily? Were you delighted when you discovered that BinaryReader exposed PeekChar () , which did nearly the same thing [0]? If that describes you, you're in for a horrible horrible surprise.

I received a bug report during the week saying that MonoTorrent reads all data twice from disk when it is hashing. I responded with "No it doesn't! That'd be crazy!", to which I was handed a screenshot of a ~310MB torrent which in the windows task manager reported ~750MB of I/O Read Bytes. I was told this happened after loading the torrent and calling HashCheck () on it. This was irrefutable evidence that something was up, but I was still unconvinced .

So over the weekend I fired up windows and double checked. I could replicate the bizarre statistic. But strangely enough, it wasn't hashing that was causing it! The I/O Read Bytes was up at around 350MB before hashing even started. But that was strange, because the only thing that happened before that was:

Torrent torrent = Torrent.Load (path);

There's no way a simple forward-only parser reading a 100kB file could possibly result in 350MB of IO reads, could it? Actually, it could! Changing the declaration to the following completely vanished the extra 350MB:

Torrent torrent = Torrent.Load (File.ReadAllBytes (path))

So what was going wrong? Internally the parser used BinaryReader.PeekChar () to figure out the type of the next element so that correct decoder could be called. I thought this would be a simple array access, or something similar. However what actually happens is that one byte is read from the underlying stream, then the stream seeks 1 byte backwards . In the case of FileStream, this meant that the entire read buffer was refilled from 'disk' [1] every time I peeked. A 100kB file really was really being turned into a 350MB monstrosity! And yes, the Mono implementation unfortunately has to do the same. So how could I fix this?


Simples! I could write a PeekableStream, one that's smart enough to not need to do horrible buffer killing seeks. What was the end result? Well, that particular .torrent file loaded nearly 5x faster, ~100ms for everything instead of ~500ms. An average file would experience a much smaller speedup. This one is a bit different in that it contains over 2000 files and the speed up is proportional to the number of BEncoded elements in the .torrent file.

public class PeekableStream : Stream
{
bool hasPeek;
Stream input;
byte[] peeked;

public PeekableStream (Stream input)
{
this.input = input;
this.peeked = new byte[1];
}

public override bool CanRead
{
get { return input.CanRead; }
}

public override bool CanSeek
{
get { return input.CanSeek; }
}

public override bool CanWrite
{
get { return false; }
}

public override void Flush()
{
throw new NotSupportedException();
}

public override long Length
{
get { return input.Length; }
}

public int PeekByte()
{
if (!hasPeek)
hasPeek = Read(peeked, 0, 1) == 1;
return hasPeek ? peeked[0] : -1;
}

public override int ReadByte()
{
if (hasPeek)
{
hasPeek = false;
return peeked[0];
}
return base.ReadByte();
}

public override long Position
{
get
{
if (hasPeek)
return input.Position - 1;
return input.Position;
}
set
{
if (value != Position)
{
hasPeek = false;
input.Position = value;
}
}
}

public override int Read(byte[] buffer, int offset, int count)
{
int read = 0;
if (hasPeek && count > 0)
{
hasPeek = false;
buffer[offset] = peeked[0];
offset++;
count--;
read++;
}
read += input.Read(buffer, offset, count);
return read;
}

public override long Seek(long offset, SeekOrigin origin)
{
long val;
if (hasPeek && origin == SeekOrigin.Current)
val = input.Seek(offset - 1, origin);
else
val = input.Seek(offset, origin);
hasPeek = false;
return val;
}

public override void SetLength(long value)
{
throw new NotSupportedException();
}

public override void Write(byte[] buffer, int offset, int count)
{
throw new NotSupportedException();
}
}


This code is under the MIT/X11 license, so everywhere you use PeekChar () and actually just want to Peek at a byte, use this class instead. Your harddrives will love you for it. If you actually want to peek at a char, extend this class to be able to read a (multi-byte) char from the underlying stream and cache it locally just like the current PeekByte method . A side benefit is that you can now peek at unseekable streams. Not too bad, eh?

[0] PeekChar does exactly what it says on the tin. It reads one (multi-byte) character from the stream. So if you're using PeerChar on a binary stream which does not contain valid data as defined by the current Encoding, you're going to corrupt some data or get exceptions. I mention this here in case anyone is using PeekChar () as a way of reading bytes from the stream.

[1] I say that the FileStream buffer was being filled from disk, but that's not quite accurate. It was actually being refilled from either the windows cache or the harddrives cache. It's physically impossible for a mere 7200 RPM harddrive to supply data that fast. However I still was physically copying 350MB of data around in memory so that was a huge penalty right there.

Sunday, October 18, 2009

MonoTorrent 0.80 - Up up and away

MonoTorrent 0.80 has been released. I'd like to say "It's the best release ever", but that always makes me think "If it wasn't the best release ever, why would I release it?"

The full release notes can be read on www.monotorrent.com. For the lazy, I'll put a quick blurb about the two new most exciting new features available:

Metadata Exchange
http://www.bittorrent.org/beps/bep_0009.html

Put simply, this means you can click on a link like this: magnet:?xt=urn:btih:12345678901234567890 and then the torrent will magically [0] be able to download. Behind the scenes what happens is that peers are found via DHT and then they are queried for the .torrent metadata. Once the metadata has been obtained, the actual downloading can commence and away you go. Finally, I can start a download via text message!

Local Peer Discovery

This allows MonoTorrent to find other peers who are downloading the same torrent on the local network. A simple UDP broadcast message is used for discovery. This is an implementation of style LDP and so is fully compatible with uTorrent and other clients which have implemented this style. The main benefit of this is that in corporate or educational environments, it's possible that many people will be trying to access the same torrent at the same time. This approach allows all these peers to connect to each other and thus transfer the bulk of their data over the internal LAN rather than all of them fighting for bandwidth on the (usually) limited WAN connection.

As per usual, there are a bunch of bug fixes and enhancements. This is one more milestone on the way to the final 1.0 release. One which I'm really looking forward to. I might even do some nostalgia posts about the big disasters I created while learning C# and implementing this library ;)

[0] Actual product does not contain magic.

EDIT: Just clarified that the LPD implementation is the uTorrent style.

Monday, June 29, 2009

Mono.Nat 1.0.2

I just tagged and released Mono.Nat 1.0.2 . It's a fairly minor bugfix release which addresses a number of minor issues:
  • Added workaround for certain versions of miniupnpd which incorrectly advertise their available services (bug has been reported upstream)
  • Fixed some other minor issues with routers reporting incorrect services.
  • Added extra API to make it easy to log the full handshake/request process to help diagnose issues
  • Stopping and Starting discovery will rediscover all available devices correctly
  • Full support for computers with multiple network cards on multiple subnets
  • Rewrote the internals to ensure that the asynchronous API is 100% asychronous - prevents calls to BeginXXX blocking on some slower routers.
Precompiled binaries and sourcecode can be downloaded here and packages will soon be winding their way to a repository near you.

If you want to forward ports automagically on a upnp empowered router near you, this is the library for you!

Thursday, May 28, 2009

Monsoon - blowing down barriers

Yes, Monsoon is now using Mono.Addins for some delicious plugability. Support for this has only just been added, so there is a severe lack of extension points defined in monsoon, but those can be added as time goes on. Right now there is one extension point. I'm sure you've already guessed what it is.


Yup, that's right. That little nifty thing at the bottom is DHT bootstrapping itself. As you can see, it's currently displaying a rather disappointing value of '1' for 'Nodes'. This is either because the bootstrap node is currently unavailable, or my router is playing silly buggers again and not forwarding my ports right. Luckily you can also bootstrap into DHT by just downloading a normal torrent. Other peers advertise when they support DHT and provide the required info to allow you to use them as a bootstrap node.

What this means is that opensuse users will finally have easy access to a DHT enabled torrent client without having to enable additional repositories. Things will work right out of box... well, it'll work as soon as you click to enable the addin which fetches it from the monsoon website ;)

Once I solidify everything there'll be a preview release of Monsoon with these features and another slightly big one I've been working on. More on that later. I've reached my word quota for this post ;)

Monday, May 18, 2009

Book memes - Reloaded

* Grab the nearest book.
* Open it to page 56.
* Find the fifth sentence.
* Post the text of the sentence in your journal along with these instructions.
* Don't dig for your favorite book, the cool book, or the intellectual one: pick the CLOSEST.

"Ponder and Ridcully waited for a few moments, but the city stayed full of normal noise, like the collapse of masonry and distant screams"

Not a bad quote, eh? And yes, I am still a kid at heart :p

Sunday, May 17, 2009

Polymorphism, why do you fail me?

Polymorphism, it's the cornerstone of object oriented programming. We couldn't live without it. So then, tell me why this rather trivial case fails to compile.

public class B : A { }

public void Foo (ref A bar) { }

public void Baz ()
{
B b = new B ();
Foo (ref b);
}


The issue is hat you can't pass a 'B' parameter type by ref where a 'ref A' is expected. Why is this? What case could possibly fail if this was allowed?

Friday, May 15, 2009

So what've I been working on?

I haven't actually blogged much about my job. That was getting kind of weird, so I thought it was high time I wrote something. Then I thought to myself "A picture is worth a thousand words, so why bother wasting my time with words". So here are two pictures which show what I've been doing over the last 5 days. Though technically i've been working on parts of this over the last 2-3 weeks ;)

Moonlight - Monday 11th May 2009:


Moonlight - Friday 15th May 2009 (with local patches):


Pretty sweet, eh?

UPDATE: If you want to look at the actual site, it's available here: http://silverlight.net/samples/sl2/toolkitcontrolsamples/run/default.html

Hopefully our next alpha preview will contain the necessary fixes to load the site up so you can see it in all its glory.

Monday, April 20, 2009

Monsoon 0.21

Monsoon 0.21 has been released. It now uses MonoTorrent 0.72. Other than bumping to a newer version of monotorrent, there were only a few minor fixes for Monsoon 0.21. The most notable of which is a fix for a change in firefoxs behaviour when opening files directly from the browser.

The update should come to a repository near you soon!

Thursday, April 16, 2009

nat-pmp support in Mono.Nat

Is there anyone out there with a week or so to complete the implementation of nat-pmp support in Mono.Nat that'd be awesome. The implementation is 80-90% complete. The only requirement is that you have a router supporting nat-pmp with which to test against. Unfortunately I don't have one and I can't get the daemon working on my router.

If you want to work on it, send me an email and I'll fill you in on what needs to be done.

Monday, April 06, 2009

MonoTorrent 0.72 released

This is a bugfix release to address a few reported issues and also a few issues that were discovered via my own testing.

* Add a helper method which ensures all data is flushed to disk
* Added additional error handling to prevent malformed DHT messages crashing the library
* Fixed issue when zeroing unused bits for torrents with an exact multiple of 32 pieces
* Fixed issue where data could be written to the wrong file if a file with the same name existed in multiple torrents
* Fixed the handling of torrents where the last file(s) are of zero length
* Fixed regression with global download rate limiting
* Fixed a performance regression with the new piece picking pipeline which resulted in lots of CPU cycles being used up on peers which have not sent an unchoke message

In other news, monotorrent.com is changing its hosting provider. It still brings you to the old website, but it'll be moved to http://projects.qnetp.net/projects/show/monotorrent soon enough. This is were future releases will be made.

Tuesday, March 31, 2009

Google Summer of Code - 2009

The final deadline for applications for the gsoc is fast approaching. You now have 4 days left, right up until 19:00 UTC on April 3rd. There are so many cool projects available this year, so make sure you apply soon before they're all taken!

Some of my personal favourites would be:
Writing SIMD code in C#
VDPAU VC-1/H.264 Support
Codec Demuxers
A Dirac decoder

And many more! Funnily enough, video encoding is what got me interested in programming in the beginning, so if you students don't snap up those projects, I will ;) Hack on something cool for the summer and earn yourself some cash while you're at it. It'll be great!

Sunday, March 22, 2009

We got the grand slam!



We've done it! We've won the Grand Slam for the first time in 61 years. It was the most nail-biting match I've watched in a while. It was so close that the result was actually down to the very last kick of the match, which was a penalty awarded* against Ireland. Thankfully Wales missed it and sealed Irelands victory.

Go on the green!

* The ref made a bad call here and awarded a penalty against Ireland because the ball bounced forward after it was dropped. The rule are pretty clear that this should be a scrum and *not* a penalty. It wasn't the first time he made that mistake either. Still, it didn't matter in the end, woo!

Update: A friend just told me that the ref originally signalled a scrum but then changed it to a penalty. Sounds like an irish guy mouthed off or did something off the ball. Did anyone notice what actually made him change his mind?

Friday, March 20, 2009

MonoTorrent 0.70

With all the excitement going on, I forgot to blog about MonoTorrent 0.70, so here's a belated summary of what went on:

* Fixed an issue for torrents with no trackers
* Optimised the Bitfield class to allow for higher performance piece picking
* Rewrote the piece picking API which resulted in a very extensible, easily testable, less buggy and faster implementation
* Fixed an issue where the announce wouldn't happen immediately after a torrent completes
* Fixed several issues with webseeding and rate limiting
* Fixed an issue which resulted in an incorrect encryption level been chosen in a small proportion of cases
* Ensure that announces to trackers always time out correctly
* Fixed a race condition when stopping a torrent while an incoming connection is being processed
* Increased the performance of disk IO so that hashing a torrent is about 10% faster
* Vastly improved performance of the BanList parser
* Fixed issue where fast pieces could be requested multiple times
* Fixed issue with selective downloading where the start/end indices of files could be offset by 1

A precompiled binary can be found here and the source tarball can be found here. Enjoy!

Sunday, March 15, 2009

What does it take to download a torrent?

The simple answer to this question is "All you need is the .torrent file. Stupid n00bs." This is true. The .torrent file contains all the metadata required to A) download the data and B) verify the data. Without this, you can't really do anything. So, do you actually need the .torrent file to begin a download?

No! All you need is the infohash for the .torrent file! The infohash is the SHA1 hash of the metadata in a .torrent file. As such, it can be used as a unique identifier for a particular .torrent file. With this infohash, you can query the BitTorrent DHT for a list of peers downloading that torrent. Then, with the help of the Metadata Exchange extension, you can connect to these peers and request that they send you the metadata from the .torrent file and you're away and downloading. Great!

"But what if some malicious peer sends you corrupt metadata, then you'd never be able to download the torrent properly!", I hear you asking. Well, in a rather beautiful twist, this is next to impossible. As I said earlier, the infohash is generated by putting the metadata in the .torrent file through a SHA1 hash. So all you have to do is hash the metadata once you have received it and then compare the result of that to the SHA1 hash you used to start the download. If they match, then you can be fairly confident that the metadata has not been corrupted/altered in any way.

As of 17:00GMT, March 15th MonoTorrent has completed its first download using only a 20 byte hash to begin the download. This is possible because of some tireless work by Olivier Dufour, who also implemented Peer Exchange, a good few parts of DHT, WebSeeding and SuperSeeding. The code for this still hasn't quite hit SVN, a bit of refactoring remains to be done. It should be in SVN within a week. I'm looking forward to his next patch of awesomenesss now.

Monday, March 02, 2009

The monsoon-devel list

I've started a mailing list for monsoon so that packagers can be kept updated and various interested people can post questions and all that jazz. Basically it's the standard -devel mailing list. That's mostly why it's called "monsoon-devel" ;)

If you're interested in keeping up to date on Monsoon, please join the group at: http://groups.google.com/group/monsoon-devel

Friday, February 27, 2009

Banshee - For all your medical record needs?

I'm sure everyone's familiar with Banshee, the most awesome media player of all time ever ;) So, how does it feel to know it has expanded into the medical record field? Good eh? Next time you go to your doctor do you know what they'll be running? That's right, Banshee - Medical Record Edition. Check out its website here:

http://gnumed.org/index.html

The regular banshee website can be found here:

http://banshee-project.org/

It's fun to open both in tabs and switch between them.

Thursday, February 19, 2009

Monsoon 0.20

Monsoon 0.20 has been released and should be in a repository near you soon. Release notes can be found here. Fun stuff!

Saturday, January 24, 2009

Excitement abounds!

Next weekend is going to be an interesting one. Several releases are coinciding:

1) MonoTorrent 0.70 is prepped and ready for release. Contains numerous bugfixes and performance enhancements.
2) Monsoon 0.20 will in a repository near you. This will be using MonoTorrent 0.70, so a nice upgrade from the previous 0.40 release.
3) The DBus daemon for monotorrent is ready for its first release. Banshee will be using this to add support for torrent based podcasts.
4) Mono.Nat will be getting it's first official package release too. It's a reasonably mature library which supports UPnP port forwarding/mapping. There's also support for nat-pmp in the library, but it's disabled due to lack of testing. If someone has a nat-pmp capable router, feel free to test the code, fix the few remaining issues and submit a patch.

But for now, it's holiday time for a week ;)

Hit Counter