arraylayouts: RCR output
For processor information, view the output of lscpu or /proc/cpuinfo.
Figure 4: The running time of branchy binary search and stl::lower_bound()
|
Figure 5:The running time of branchy binary search when all data fits into L3 cache. |
Figure 6: The running times of branchy binary search versus branch-free binary search when all data fits into L2 cache. |
Figure 7: The running times of branchy binary search versus branch-free binary search for large values of $n$. |
Figure 8: Branch-free binary search with explicit prefetching is competitive for small values of $n$ and a clear winner for large values of $n$. |
Figure 10: The performance of branchy and branch-free Eytzinger search. |
Figure 11: The performance of branch-free Eytzinger with prefetching. |
Figure 12: Eytzinger versus binary search for small values of $n$. |
Figure 13: Performance of the branch-free mixed layout (without prefetching). |
Figure 14: Performance of the branch-free mixed layout with prefetching. |
Figure 16: The performance of Btree search algorithms.} |
Figure 17: The performance of Btree search for small values of $n$. |
Figure 18: The performance of the vEB layout. |
Figure 19: The effects of deeper prefetching in the Eytzinger search algorithm. |
Figure 20: Testing $(Bk+1)$-ary trees. |
Figure 21: The performance of searches with two, four, and eight threads. |
Figure 22: The performance of algorithms on 64-bit and 128-bit data. |
Figure 24: The impact of masking prefetches. |