Saturday, April 19, 2008

visualizing edb's tag structure

This post deals with visualizing the internal storage format used by the elementaldb project.
I'm in the middle of rewriting the radix-tree storage engine in the elementaldb - while speed isn't a huge goal for the project, the newly rewritten proper crit-bit tree is performing inserts slower than the old version. So I thought it'd be interesting to use Graphviz to visualize the tag structure (and also the current radix tree) implementation. The tagm module was originally written for a mail client where messages could be tagged with a specific value. The generic requirements of the tag module were: it had to perform quick appends and it had to support a sort of hierarchical structure. The result is a linked-list implementation with nodes that can have an unlimited number of children. It's nothing really fancy but provides a convenient layer on which to write other data structures (b+tree, radix or anything else really).


This image shows the result of doing ten inserts of x = [0..9]. Each node is a tag value, in the current rtree implementation a tag node can potentially have a maximum of 256 children. These are sequentially searched (I know, I know) during a rtree put operation when a split branch point is reached. This means that storing binary data is very inefficient, hence the need to write a proper crit-bit implementation.


The image above shows the resulting structure when storing in a crit-bit structure. In this case a tag can have a maximum of 3 children: 0, 1, and data.