Hashing Technique: A New Index Method for High Dimensional Data

Loading...
Thumbnail Image

Files

CS-TR-4059.ps (197.6 KB)
No. of downloads: 459
CS-TR-4059.pdf (195.22 KB)
No. of downloads: 644

Publication or External Link

Date

1999-09-25

Advisor

Citation

DRUM DOI

Abstract

When dimension goes high, sequential scan processing becomes more efficient than most index-based query. In this paper, we propose a new index method for high-dimensional data spaces. This method is based on hashing technique. The basic idea is: First find a hashing function which puts the given d-dimensional space data into a d'-dimensional buckets where d' << d. Then, we use existing index techniques to manage those buckets. We later define some properties of a good hashing function and give four hashing functions. To demonstrate the efficiency of our idea, we experimentally compared our algorithms with sequential scan and Pyramid-Techniques. The results demonstrate that this method outperforms others for skewed data set. It always beats the sequential scan by using only half of elapsed time for range query. However if the data has uniform distribution, Pyramid-Technique is still the best method.

Notes

Rights