A Comparative Study of Spatial Indexing Techniques for Multidimensional Scientific Datasets

Thumbnail Image
CS-TR-4556.ps(449.16 KB)
No. of downloads: 347
CS-TR-4556.pdf(173.46 KB)
No. of downloads: 1379
Publication or External Link
Nam, Beomseok
Sussman, Alan
Scientific applications that query into very large multidimensional datasets are becoming more common. These datasets are growing in size every day, and are becoming truly enormous, making it infeasible to index individual data elements. We have instead been experimenting with {\em chunking} the datasets to index them, grouping data elements into small chunks of a fixed, but dataset-specific, size to take advantage of spatial locality. While spatial indexing structures based on R-trees perform reasonably well for the rectangular bounding boxes of such chunked datasets, other indexing structures based on KDB-trees, such as Hybrid trees, have been shown to perform very well for point data. In this paper, we investigate how all these indexing structures perform for multidimensional scientific datasets, and compare their features and performance with that of {\bf SH-trees}, an extension of Hybrid trees, for indexing multidimensional rectangles. Our experimental results show that the algorithms for building and searching SH-trees outperform those for R-trees, R*-trees, and X-trees for both real application and synthetic datasets and queries. We show that the SH-tree algorithms perform well for both low and high dimensional data, and that they scale well to high dimensions both for building and searching the trees. (UMIACS-TR-2004-03)