This is a follow-up to an experiment that I started on May 8, 2015.
If you just came to help out, then skip to the bottom.
If you just came to read about the result, then you can read the paper.
It turns out that, because of cache effects, sorted order is usually not the most efficient way to store data for searching. In this project, we study the performance of different memory layouts for binary searching:
sorted
: A usual sorted array with binary
search applied to it.eytzinger
: An implicit binary search tree
packed into an array using the Eytzinger (a.k.a. BFS) layout
usually seen with binary heaps.btree
: An implicit B-tree
packed into an array using the obvious generalization of
the Eytzinger layout. The value B here is chosen so that
B-1 data items fit into 64 bytes (the most common cache
line width).veb
: An implicit binary search tree packed
into an array using the van Emde Boas layout seen in
the cache-oblivious literature.
The last time I asked this question, my answer was the
following: The answer is complicated, and it seems to depend
on the data size, the cache size, the cache line width, and the
relative cache speed.
Since then I've been working with Paul Khuong, and we've come to the
conclusion that the answer is not really that complicated:
The Eytzinger layout offers the best all-around performance
over a wide range of array lengths. To understand why, you can
read the paper.
Spoiler: It has to do do with hiding latency by overusing bandwidth. (Hacker News user derf_9 has a nice summary.)
Most of the data we've collected so far supports our hypothesis, but it's always nice to have more data. If you have a Linux machine and would like to contribute to this effort, then you can either try to reproduce our results or you can send us more data. Instructions for both are below.
arraylayout-master
. Read the last section of the README.md
file in that directory and, if you think you're ready, run
the rcr
program in that directory.
$ tar xzvf arraylayout.tgz arraylayout-0.1/ arraylayout-0.1/Makefile arraylayout-0.1/eytzinger_array.h arraylayout-0.1/main.cpp arraylayout-0.1/btree_array.h arraylayout-0.1/sorted_array.h arraylayout-0.1/cacher.cpp arraylayout-0.1/README arraylayout-0.1/veb_array.h arraylayout-0.1/base_array.hand then run it like this:
$ cd arraylayout-0.1 ; make dataIf you're on a Mac, the procedure is similar:
$ cd arraylayout-0.1 ; make -f Makefile.macos dataThis will create file called
run_data.tgz
that you can
just email me. If you want to be extra helpful, you can also run:
sudo dmidecode --type 17and include the output in your email. This will give me detailed information about the amount, type, and speed of memory installed in your computer, as described here.
* These results are qualitatively different from those in the paper.
1 This processor suffers from out of bounds prefetching, as discussed at the end of Section 5.3.
2 This is a non x86 processor and behaves very differently.