Parallel R-trees

Loading...
Thumbnail Image

Files

CS-TR-2820.ps (341.81 KB)
No. of downloads: 259
CS-TR-2820.pdf (223.51 KB)
No. of downloads: 778

Publication or External Link

Date

1998-10-15

Advisor

Citation

DRUM DOI

Abstract

We consider the problem of exploiting parallelism to accelerate the performance of spatial access methods and specifically, R-trees [14]. Our goal is to design a server for spatial data, so that to maximize the throughput of range queries. This can be achieved by (a) maximizing parallelism for large range queries, and (b) by engaging as few disks as possible on point queries [26].

We propose a simple hardware architecture consisting of one processor with several disks attached to it. On this architecture, we propose to distribute the nodes of a traditional R-tree, with crossdisk pointers ('Multiplexed' R-tree). The R-tree code is identical to the one for a single-disk R-tree, with the only addition that we have to decide which disk a newly created R-tree node should be stored in. We propose and examine several criteria to choose a disk for a new node. The most successful one, termed 'proximity index' or PI, estimates the similarity of the new node with the other Rtree nodes already on a disk, and chooses the disk with the lowest similarity. Experimental results show that our scheme consistently outperforms all the other heuristics for node-to-disk assignments, achieving up to 55% gains over the Round Robin one. Experiments also indicate that the multiplexed Rtree with PI heuristic gives better response time than the disk-stripping (="Super-node") approach, and imposes lighter load on the I/O sub-system.

The speed up of our method is close to linear speed up, increasing with the size of the queries. (Also cross-referenced as UMIACS-TR-92-1)

Notes

Rights