Parallel R-trees
Abstract
We consider the problem of exploiting parallelism to accelerate the
performance of spatial access methods and specifically, R-trees [14].
Our goal is to design a server for spatial data, so that to maximize
the throughput of range queries. This can be achieved by (a) maximizing
parallelism for large range queries, and (b) by engaging as few disks
as possible on point queries [26].
We propose a simple hardware architecture consisting of one processor
with several disks attached to it. On this architecture, we propose to
distribute the nodes of a traditional R-tree, with crossdisk pointers
('Multiplexed' R-tree). The R-tree code is identical to the one for a
single-disk R-tree, with the only addition that we have to decide
which disk a newly created R-tree node should be stored in. We propose
and examine several criteria to choose a disk for a new node. The most
successful one, termed 'proximity index' or PI, estimates the similarity
of the new node with the other Rtree nodes already on a disk, and chooses
the disk with the lowest similarity. Experimental results show that our
scheme consistently outperforms all the other heuristics for node-to-disk
assignments, achieving up to 55% gains over the Round Robin one.
Experiments also indicate that the multiplexed Rtree with PI heuristic
gives better response time than the disk-stripping (="Super-node") approach,
and imposes lighter load on the I/O sub-system.
The speed up of our method is close to linear speed up, increasing with
the size of the queries.
(Also cross-referenced as UMIACS-TR-92-1)