Radial Tree by Pumpkin-Pirate Memory Layouts for Binary Search
Overview
The first round of experiments is over, but you can still help out with the second round.

Binary search is a wonderful algorithm: theoretically optimal, fast in practice, and discoverable by school children playing guessing games. Usually we think of binary search as being applied to a sorted array, and we teach it this way.

It turns out that, because of cache effects, sorted order is usually not the most efficient way to store data for searching. In this project, we study the performance of different memory layouts for binary searching:

  1. sorted: A usual sorted array with binary search applied to it.
  2. eytzinger: An implicit binary search tree packed into an array using the Eytzinger (a.k.a. BFS) layout usually seen with binary heaps.
  3. btree: An implicit B-tree packed into an array using the obvious generalization of the Eytzinger layout. The value B here is chosen so that B-1 data items fit into 64 bytes (the most common cache line width).
  4. veb: An implicit binary search tree packed into an array using the van Emde Boas layout seen in the cache-oblivious literature.
Results
Which of these array memory layouts is fastest?

The answer is complicated, and it seems to depend on the data size, the cache size, the cache line width, and the relative cache speed. In many settings B-trees (with a properly chosen value of B) are best. In others, the Eytzinger layout wins. In others, still, the van Emde Boas layout is the winner (at least for large enough array sizes).

For an example, consider the following two graphs, generated by running the same code on two different Intel machines. In the left graph, the Eytzinger layout is almost as slow as a plain sorted array while the van Emde Boas and B-tree layouts are more than twice as fast. In the right graph, the Eytzinger layout and b-tree are the fastest, the sorted array is still the slowest, and the vEB layout is somewhere in betweeen (for array sizes).

This is why I need your help. I have some theories that may explain this complicated behaviour, but they need to be validated on a larger sample of hardware. If you have a Linux machine and would like to contribute to this effort, then please follow the instructions at the top of the page.

Help Out
If you'd like to help out, then download the code (version 1 code), unpack it like this:
$ tar xzvf arraylayout.tgz 
arraylayout-0.0/
arraylayout-0.0/Makefile
arraylayout-0.0/eytzinger_array.h
arraylayout-0.0/main.cpp
arraylayout-0.0/btree_array.h
arraylayout-0.0/sorted_array.h
arraylayout-0.0/cacher.cpp
arraylayout-0.0/README
arraylayout-0.0/veb_array.h
arraylayout-0.0/base_array.h
and then run it like this:
$ cd arraylayout-0.0 ; make data
This will create file called run_data.tgz that you can just email me. If you want to be extra helpful, you can also run:
sudo dmidecode --type 17
and include the output in your email. This will give me detailed information about the amount, type, and speed of memory installed in your computer, as described here.
Some Data
Here is some (preliminary, badly-formatted) data:
Observations
These are a just a few preliminary observations from browsing the data (mainly for myself).