Sunday, December 20, 2009

Expression Trees - serializing your data

Update: Just to clarify - the code snippets below are under the MIT/X11 license.

I spent a few hours over the weekend writing a binary serializer using expression trees. I wanted to see how things would look using the new features available in .NET 4.0. My requirements were pretty simple:

1) Serialize all public properties in a type or a subset of them
2) Control the order in which they're serialized - sometimes you need to interop with an existing and you must write your data in a specific order
3) Control how a primitive is converted - Do you need to write value types in big endian, little endian, middle endian?
4) Easy to use API.

So lets start with the API. This is what I was hoping to use:

public class Secondary
{
public int First { get; set; }
public int Second { get; set; }
public int Third { get { return First + Second; } }
}

public class MyClass
{
public byte ByteProp { get; set; }
public short ShortProp { get; set; }
public int IntProp { get; set; }
public long LongProp { get; set; }
public string StringProp { get; set; }
}

static void Main(string[] args)
{
// Register a message so that all public fields will be serialized
Message.Register<MyClass>();

// Register a message so that only some fields are serialized and
// they are serialized in the specified order
Message.Register<Secondary>(
d => d.Second,
d => d.First
);

// Create a stream to serialize the data to
Stream s = new MemoryStream();
var message = new MyClass {
IntProp = 1,
LongProp= 2,
ByteProp= 3,
ShortProp = 4,
StringProp = "Hello World"
};

// Encode the message to the stream
MessageEncoder.Encode(message, s);

// Rewind the stream and then decode the message
s.Position = 0;
var decoded = MessageDecoder.Decode<MyClass>(s);
}

It's pretty standard stuff. You can work with the standard serializer logic (serialize properties alphabetically) by registering an object without specifying any specific properties or you can customise which properties are serialized. This could also be done using attributes, but using attributes to control the order in which properties are serialized would be more error prone than the above.

Firstly, sometimes you need to write your data in big endian, others you need little endian. Sometimes you won't care. What you need is to be able to control this:
MessageEncoder.RegisterPrimitiveEncoder<int>((value, stream) => {
stream.Write(BitConverter.GetBytes(value));
});

It's simple. Any type which can be directly converted to an array of bytes is classified as a 'primitive'. Each primitive can have an encoder/decoder pair registered as above.

public static class MessageEncoder
{
static Dictionary<Type, Delegate> encoders;
static Dictionary<Type, Delegate> primitives;

static MessageEncoder()
{
encoders = new Dictionary<Type, Delegate>();
primitives = new Dictionary<Type, Delegate>();
RegisterPrimitiveEncoders();
}

static void RegisterPrimitiveEncoders()
{
RegisterPrimitiveEncoder<byte>((value, stream) =>
stream.WriteByte(value)
);

RegisterPrimitiveEncoder<short>((value, stream) =>
stream.Write(BitConverter.GetBytes(IPAddress.HostToNetworkOrder(value)))
);

RegisterPrimitiveEncoder<int>((value, stream) =>
stream.Write(BitConverter.GetBytes(IPAddress.HostToNetworkOrder(value)))
);

RegisterPrimitiveEncoder<long>((value, stream) =>
stream.Write(BitConverter.GetBytes(IPAddress.HostToNetworkOrder(value)))
);

var intWriter = (Action<int, Stream>)primitives[typeof (int)];
RegisterPrimitiveEncoder<string>((value, stream) => {
var buffer = Encoding.UTF8.GetBytes(value);
intWriter(buffer.Length, stream);
stream.Write(buffer);
});
}

public static void RegisterPrimitiveEncoder<T>(Action<T, Stream> encoder)
{
primitives [typeof (T)] = encoder;
}

public static void RegisterMessage<T>(params Expression<Func<T, object>>[] properties)
{
RegisterMessage<T>(properties.Select(p => p.AsPropertyInfo ()));
}

public static void RegisterMessage<T>(IEnumerable<PropertyInfo> properties)
{
var propertyEncoders = new List<Expression>();

// The encode function takes an instance of the class we're decoding and the Stream
// which we should write the data to.
ParameterExpression source = Expression.Parameter(typeof(T), "source_param");
ParameterExpression stream = Expression.Parameter(typeof(Stream), "stream");

// For each property, get the encoder which will convert the value of the property to a byte[]
// which can be written to the stream.
foreach (var property in properties) {
// Get the encoder for this property type
var action = primitives[property.PropertyType];
// Create a var which holds the Action <T, Stream> which encodes the data to the stream
Expression converter = Expression.Constant(action, action.GetType ());
// Invoke the encoder passing the value of the property and the 'stream'
Expression invoker = Expression.Invoke(converter, Expression.Property(source, property), stream);
// Add the encoder for this property to the list.
propertyEncoders.Add(invoker);
}

// Create an expression block which will execute each of the encoders one by one
Expression block = Expression.Block(propertyEncoders);
encoders.Add(typeof(T), Expression.Lambda<Action<T, Stream>>(
block,
source,
stream
).Compile());
}

public static void Encode<T>(T message, Stream s)
{
var encoder = (Action<T, Stream>)encoders[typeof (T)];
encoder (message, s);
}
}

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Linq.Expressions;
using System.Reflection;
using System.Net;
using System.IO;

namespace Encoder
{
public static class MessageDecoder
{
static Dictionary<Type, Delegate> decoders;
static Dictionary<Type, Delegate> primitives;

static MessageDecoder()
{
decoders = new Dictionary<Type, Delegate>();
primitives = new Dictionary<Type, Delegate>();
RegisterDefaultDecoders();
}

static void RegisterDefaultDecoders()
{
RegisterPrimitiveDecoder<byte>((s) => {
var val = s.ReadByte();
if (val == -1)
throw new EndOfStreamException();
return (byte)val;
});

RegisterPrimitiveDecoder<short>((s) => IPAddress.NetworkToHostOrder (s.ReadShort()));
RegisterPrimitiveDecoder<int>(s => IPAddress.NetworkToHostOrder (s.ReadInt()));
RegisterPrimitiveDecoder<long>(s => IPAddress.NetworkToHostOrder (s.ReadLong()));

var intDecoder = (Func<Stream, int>)primitives[typeof(int)];
RegisterPrimitiveDecoder<string>(s => {
var length = intDecoder(s);
var buffer = new byte[length];
s.Read(buffer, 0, buffer.Length);
return Encoding.UTF8.GetString(buffer);
});
}

public static void RegisterPrimitiveDecoder<T>(Func<Stream, T> decoder)
{
primitives.Add(typeof(T), decoder);
}

public static void RegisterMessage<T>(params Expression<Func<T, object>>[] properties)
{
RegisterMessage<T>(properties.Select(d => d.AsPropertyInfo()));
}

public static void RegisterMessage<T>(IEnumerable<PropertyInfo> properties)
{
var propertyDecoders = new List<Expression>();

// The decode function takes an instance of the class we're decoding and the Stream
// containing the data to decode.
ParameterExpression source = Expression.Parameter(typeof(T), "source_param");
ParameterExpression stream = Expression.Parameter(typeof(Stream), "stream");

// For each property, get the primitive decoder which will read data from the stream and
// return a value of the correct type.
foreach (var property in properties) {
var action = primitives[property.PropertyType];
// Create a var which holds the Func <Stream, T> which decodes the data from the stream
Expression decoder = Expression.Constant(action, action.GetType());
// Invoke the decoder passing 'stream' as the parameter
Expression invoker = Expression.Invoke(decoder, stream);
// Store the return value of the decoder in the property.
Expression setter = Expression.Call(source, property.GetSetMethod(), invoker);
// Add the decoder for this property to the list.
propertyDecoders.Add(setter);
}

// Create a block which will execute the decoders for all the fields one after another.
Expression block = Expression.Block(propertyDecoders);
decoders.Add (typeof (T), Expression.Lambda<Action<T, Stream>>(
block,
source,
stream
).Compile ());
}

public static T Decode<T>(Stream s) where T : class, new()
{
T t = new T();
var decoder = (Action<T, Stream>)decoders[typeof(T)];
decoder(t, s);
return t;
}
}
}


The idea is quite simple. For each class we can generate an ideal serializer using expression trees which doesn't require boxing or casting. This way we can avoid the use of reflection when serializing objects and so avoid the performance penalties incurred that. The code above only handles the simple case where a class consists of primitive types (int, long, string) , though it'd be easy enough to extend it to support more complex scenarios.

The serializer as you see it could not have been written with .NET 3.0. Some of the key components like BlockExpression were only introduced with .NET 4.0. If your object contains an array which needs to be serialized, you'll need the new IndexExpression too. Sure, it's possible to fake these using some anonymous delegates and Actions, but that's not pretty :)

The total implementation is less than 170 LOC. I'd be willing to bet that with another 100 LOC you could support most constructs. If you're currently a heavy user of reflection to provide object serialization, it's time to update ;)

9 comments:

Don said...

> If you're currently a heavy user of to provide object serialization, it's time to update ;)

I think you missed something here.

WorldMaker said...

This is also really cool, Alan. Would you mind explicitly declaring a license (add useful comments to the top of the file blocks) for this and the ChangeNotifier classes? (Ms-PL would certainly be awesome, if you don't have a particular license in mind.)

Also, maybe its time to collect these into one or more source code repositories and post them to Bitbucket or Github or Launchpad or somewhere.

Alan said...

@Don: *doh*. Blogger has developed an annoying bug where it deletes two words instead of one when you're editing. I catch it most of the time, but obviously not there ;) I meant to have the word "reflection" in there. Post updated.

@WorldMaker: I updated to explicitly put it under the MIT/X11 license. The idea of putting them in a VCS is nice alright. I may end up doing that. But then I'd have to keep coming up with useful snippets or the repository would get lonely ;)

Jonathan Pryor said...

@Alan: Throw your useful snippets into Cadenza. It's a snippet repository. :-)

http://gitorious.org/cadenza

Chat with us on ##csharp at irc.freenode.org.

Anonymous said...

酒店兼職 酒店打工 打工兼差 台北酒店 酒店兼差 酒店經紀 禮服酒店 酒店工作 酒店上班 兼差 酒店應徵 酒店 打工兼職 打工

cc22 said...

情趣用品,情趣,
角色扮演,吊帶襪,丁字褲,飛機杯,
按摩棒,跳蛋,G點,
自慰套,
情趣內衣,
情趣,情趣用品,
SM,G點,按摩棒,
飛機杯,充氣娃娃,
自慰套,情趣用具,

cc22 said...

角色扮演,
睡衣,
SM,
潤滑液,
情趣玩具,
愛愛,

情人趣味用品,
情人趣味千奈,
情人趣味愛戀,
情趣味用品,
情趣用具,

跳蛋,
G點,
按摩棒,
跳蛋,
飛機杯,
充氣娃娃,
自慰套,
情趣娃娃,
自慰器,
情趣用品,情趣,

gaohui said...

Have you noticed ed hardy Clothing that she is spending time with ed hardy sale one person in particular ed hardy and they seemed to come from ed hardy UK nowhere. When you ask how she ed hardy cheap knows them she becomes aloof and ed hardy Clothes disinterested. Is there someone's house ed hardy store she seems to be always going to? This edhardy.com could spell something is wrong with the christian audigier sale relationship. Is she taking trips, possibly day ed hardy dresses trips or small vacations without you? If ed hardy Polos she was doing this before you even ed hardy sandals got married or dated, then it may be okay, but if it is a recent ed hardy Jackets development then you may have problems.

aliya seen said...

Data Stirling is now becomes possible. It is linked with statistical facts. To do my statistics homework for me would be great advantage for me.

Hit Counter