Floating Buffer Tree

Returning to our original problem of answering a batch of queries in a online setting where we must immediately return query results, the application of the buffered R-Tree presents some problems. The first of these is that the general algorithm is designed to wait until O(m) operations have been received before a buffer emptying process is initiated, however in the online case we want to force a buffer emptying process immediately. The I/O complexity of the buffered R-Tree suffers in this case because we must load the complete subtree rooted at a buffer node, which requires O(m) I/Os, even if the queries being routed to the next buffer level touch only a small subset of the nodes in the subtree.

The floating buffer tree was designed to take advantage of the buffer tree concept to improve the efficiency in answering online queries. The general idea is that we can achieve optimal I/O efficiency if any R-Tree node is loaded into main memory just a single time to answer all K queries. Furthermore any node not needed to answer one of the K queries will never be loaded while the batch of queries is being processed.

To achieve this rather than fix buffers to nodes at certain levels of the tree buffers are allowed to 'float' and are attached to, and removed from, nodes on a as required basis. The algorithm works basically as follows. All queries in a particular batch are initially added to a buffer attached to the root node. This buffer is then added to a priority queue, where the priority of a buffer is it's depth in the tree. Next, the buffer flushing process is invoked. The buffer flushing process picks the buffer of lowest (nearest leaf level) depth in the tree from the priority queue and routes all queries from this buffer to the leaf level or possibly to another buffer. To route queries the first query is popped off the buffer and we route it, in depth first manner, to report all leaves, loading nodes from the R-Tree into main memory as necessary. If there is still available memory we can route the second query and so on until the main memory allocated for R-Tree nodes (1/2 of main memory) is exhausted. At this point we continue to route the current query (and others from the buffer being flushed) however if the search path at node v for some query q leads to a child of v that is not in main memory we fix a buffer to v, add q to the buffer and add the buffer to the priority queue. If there is already a buffer fixed to v we simply insert q into this buffer.

Figure 5 shows the floating buffer tree during the flushing stage. Below psuedo-code for the proceedure is listed along with explanatory notes.

**Figure 5:** The floating buffer tree while flushing the buffer at node v. In this example there is a buffer at node w created during an earlier flushing step. This buffer will eventually be flushed after all buffers created while flushing v are flushed because its depth is greater.

Proceedure BulkQuery( list of queries Q )
1. create new buffer at root node
2. insert all elements of Q into root buffer
3. insert root buffer into priority queue PQ with depth as priority
4. call FlushBuffers(PQ)

Proceedure FlushBuffers( buffer priority queue PQ )
1. while {PQ is not empty}
   a. get lowest buffer from PQ
   b. v <= node associated with lowest buffer
   c. for {each query in buffer of v}
      (i) for {each unvisited child of v}
          - if {child is in memory or memory is not exhausted}
              * load child into memory (if necessary)
              * Route(child, query)
          - else {memory is full}
              * if {shadow buffer t does not exist}
                  fetch t from buffer pool¹
              * insert query into t
   d. delete current buffer from v
   e. if { t is not empty }
      (i) fix t to node v
      (ii) insert t into PQ

Proceedure Route( node v, query q )
1. if {v is a leaf} 
   a. report all results
2. else 
3. a. for {each child of v}
      (i) if {child is in memory or memory is not exhausted}
         - Load child into memory if necessary
         - Route(child, q)
      (ii) else {memory is full}
         - if {v has a buffer}
             * insert query into buffer of v
         - else {v has no buffer}
            * create a new buffer and insert q²
            * fix buffer to node v
            * insert buffer into PQ
            * break from for loop³

Additional Notes on Algorithm

During the flushing process we create a 'shadow' buffer, 't', that will hold any queries that cannot be pushed down entirely from the current node (i.e. there are still branches of the current node that could not yet be loaded into mememory). At the time the shadow buffer is created we need to copy the visited list from the current buffer to the shadow buffer, and update the shadow buffers visited list to include the additional paths currently in memory.
At the time a buffer is created we need to check which children of its associated node are currently loaded in memory. These should be marked as 'visited' for the buffer so that when we flush the buffer no queries are pushed along these branches. This step need only be executed at the moment a buffer is created as pushing items onto the buffer will not impact which paths should be evaluated during the flushing stage.
Once a query is inserted into the buffer of v there is no use in examining the remaining children of v as we know these will not be loaded in main memory. This is true so long a the 'for' loops in the FlushBuffers and Route proceedures operate from left to right through the list of children

Next: Comparison Up: Efficient Batch Query Processing Previous: Buffered R-Tree

Craig Dillabaugh 2007-04-26