I spent a few hours over the weekend writing a binary serializer using expression trees. I wanted to see how things would look using the new features available in .NET 4.0. My requirements were pretty simple:
1) Serialize all public properties in a type or a subset of them
2) Control the order in which they're serialized - sometimes you need to interop with an existing and you must write your data in a specific order
3) Control how a primitive is converted - Do you need to write value types in big endian, little endian, middle endian?
4) Easy to use API.
So lets start with the API. This is what I was hoping to use:
public class Secondary
{
public int First { get; set; }
public int Second { get; set; }
public int Third { get { return First + Second; } }
}
public class MyClass
{
public byte ByteProp { get; set; }
public short ShortProp { get; set; }
public int IntProp { get; set; }
public long LongProp { get; set; }
public string StringProp { get; set; }
}
static void Main(string[] args)
{
// Register a message so that all public fields will be serialized
Message.Register<MyClass>();
// Register a message so that only some fields are serialized and
// they are serialized in the specified order
Message.Register<Secondary>(
d => d.Second,
d => d.First
);
// Create a stream to serialize the data to
Stream s = new MemoryStream();
var message = new MyClass {
IntProp = 1,
LongProp= 2,
ByteProp= 3,
ShortProp = 4,
StringProp = "Hello World"
};
// Encode the message to the stream
MessageEncoder.Encode(message, s);
// Rewind the stream and then decode the message
s.Position = 0;
var decoded = MessageDecoder.Decode<MyClass>(s);
}
It's pretty standard stuff. You can work with the standard serializer logic (serialize properties alphabetically) by registering an object without specifying any specific properties or you can customise which properties are serialized. This could also be done using attributes, but using attributes to control the order in which properties are serialized would be more error prone than the above.
Firstly, sometimes you need to write your data in big endian, others you need little endian. Sometimes you won't care. What you need is to be able to control this:
MessageEncoder.RegisterPrimitiveEncoder<int>((value, stream) => {
stream.Write(BitConverter.GetBytes(value));
});
It's simple. Any type which can be directly converted to an array of bytes is classified as a 'primitive'. Each primitive can have an encoder/decoder pair registered as above.
public static class MessageEncoder
{
static Dictionary<Type, Delegate> encoders;
static Dictionary<Type, Delegate> primitives;
static MessageEncoder()
{
encoders = new Dictionary<Type, Delegate>();
primitives = new Dictionary<Type, Delegate>();
RegisterPrimitiveEncoders();
}
static void RegisterPrimitiveEncoders()
{
RegisterPrimitiveEncoder<byte>((value, stream) =>
stream.WriteByte(value)
);
RegisterPrimitiveEncoder<short>((value, stream) =>
stream.Write(BitConverter.GetBytes(IPAddress.HostToNetworkOrder(value)))
);
RegisterPrimitiveEncoder<int>((value, stream) =>
stream.Write(BitConverter.GetBytes(IPAddress.HostToNetworkOrder(value)))
);
RegisterPrimitiveEncoder<long>((value, stream) =>
stream.Write(BitConverter.GetBytes(IPAddress.HostToNetworkOrder(value)))
);
var intWriter = (Action<int, Stream>)primitives[typeof (int)];
RegisterPrimitiveEncoder<string>((value, stream) => {
var buffer = Encoding.UTF8.GetBytes(value);
intWriter(buffer.Length, stream);
stream.Write(buffer);
});
}
public static void RegisterPrimitiveEncoder<T>(Action<T, Stream> encoder)
{
primitives [typeof (T)] = encoder;
}
public static void RegisterMessage<T>(params Expression<Func<T, object>>[] properties)
{
RegisterMessage<T>(properties.Select(p => p.AsPropertyInfo ()));
}
public static void RegisterMessage<T>(IEnumerable<PropertyInfo> properties)
{
var propertyEncoders = new List<Expression>();
// The encode function takes an instance of the class we're decoding and the Stream
// which we should write the data to.
ParameterExpression source = Expression.Parameter(typeof(T), "source_param");
ParameterExpression stream = Expression.Parameter(typeof(Stream), "stream");
// For each property, get the encoder which will convert the value of the property to a byte[]
// which can be written to the stream.
foreach (var property in properties) {
// Get the encoder for this property type
var action = primitives[property.PropertyType];
// Create a var which holds the Action <T, Stream> which encodes the data to the stream
Expression converter = Expression.Constant(action, action.GetType ());
// Invoke the encoder passing the value of the property and the 'stream'
Expression invoker = Expression.Invoke(converter, Expression.Property(source, property), stream);
// Add the encoder for this property to the list.
propertyEncoders.Add(invoker);
}
// Create an expression block which will execute each of the encoders one by one
Expression block = Expression.Block(propertyEncoders);
encoders.Add(typeof(T), Expression.Lambda<Action<T, Stream>>(
block,
source,
stream
).Compile());
}
public static void Encode<T>(T message, Stream s)
{
var encoder = (Action<T, Stream>)encoders[typeof (T)];
encoder (message, s);
}
}
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Linq.Expressions;
using System.Reflection;
using System.Net;
using System.IO;
namespace Encoder
{
public static class MessageDecoder
{
static Dictionary<Type, Delegate> decoders;
static Dictionary<Type, Delegate> primitives;
static MessageDecoder()
{
decoders = new Dictionary<Type, Delegate>();
primitives = new Dictionary<Type, Delegate>();
RegisterDefaultDecoders();
}
static void RegisterDefaultDecoders()
{
RegisterPrimitiveDecoder<byte>((s) => {
var val = s.ReadByte();
if (val == -1)
throw new EndOfStreamException();
return (byte)val;
});
RegisterPrimitiveDecoder<short>((s) => IPAddress.NetworkToHostOrder (s.ReadShort()));
RegisterPrimitiveDecoder<int>(s => IPAddress.NetworkToHostOrder (s.ReadInt()));
RegisterPrimitiveDecoder<long>(s => IPAddress.NetworkToHostOrder (s.ReadLong()));
var intDecoder = (Func<Stream, int>)primitives[typeof(int)];
RegisterPrimitiveDecoder<string>(s => {
var length = intDecoder(s);
var buffer = new byte[length];
s.Read(buffer, 0, buffer.Length);
return Encoding.UTF8.GetString(buffer);
});
}
public static void RegisterPrimitiveDecoder<T>(Func<Stream, T> decoder)
{
primitives.Add(typeof(T), decoder);
}
public static void RegisterMessage<T>(params Expression<Func<T, object>>[] properties)
{
RegisterMessage<T>(properties.Select(d => d.AsPropertyInfo()));
}
public static void RegisterMessage<T>(IEnumerable<PropertyInfo> properties)
{
var propertyDecoders = new List<Expression>();
// The decode function takes an instance of the class we're decoding and the Stream
// containing the data to decode.
ParameterExpression source = Expression.Parameter(typeof(T), "source_param");
ParameterExpression stream = Expression.Parameter(typeof(Stream), "stream");
// For each property, get the primitive decoder which will read data from the stream and
// return a value of the correct type.
foreach (var property in properties) {
var action = primitives[property.PropertyType];
// Create a var which holds the Func <Stream, T> which decodes the data from the stream
Expression decoder = Expression.Constant(action, action.GetType());
// Invoke the decoder passing 'stream' as the parameter
Expression invoker = Expression.Invoke(decoder, stream);
// Store the return value of the decoder in the property.
Expression setter = Expression.Call(source, property.GetSetMethod(), invoker);
// Add the decoder for this property to the list.
propertyDecoders.Add(setter);
}
// Create a block which will execute the decoders for all the fields one after another.
Expression block = Expression.Block(propertyDecoders);
decoders.Add (typeof (T), Expression.Lambda<Action<T, Stream>>(
block,
source,
stream
).Compile ());
}
public static T Decode<T>(Stream s) where T : class, new()
{
T t = new T();
var decoder = (Action<T, Stream>)decoders[typeof(T)];
decoder(t, s);
return t;
}
}
}
The idea is quite simple. For each class we can generate an ideal serializer using expression trees which doesn't require boxing or casting. This way we can avoid the use of reflection when serializing objects and so avoid the performance penalties incurred that. The code above only handles the simple case where a class consists of primitive types (int, long, string) , though it'd be easy enough to extend it to support more complex scenarios.
The serializer as you see it could not have been written with .NET 3.0. Some of the key components like BlockExpression were only introduced with .NET 4.0. If your object contains an array which needs to be serialized, you'll need the new IndexExpression too. Sure, it's possible to fake these using some anonymous delegates and Actions, but that's not pretty :)
The total implementation is less than 170 LOC. I'd be willing to bet that with another 100 LOC you could support most constructs. If you're currently a heavy user of reflection to provide object serialization, it's time to update ;)