Hashing Technique: A New Index Method for High Dimensional Data
Hashing Technique: A New Index Method for High Dimensional Data
Loading...
Files
Publication or External Link
Date
1999-09-25
Authors
Song, Zhexuan
Roussopoulos, Nick
Advisor
Citation
DRUM DOI
Abstract
When dimension goes high, sequential scan processing becomes more
efficient than most index-based query. In this paper, we propose a new
index method for high-dimensional data spaces. This method is based on
hashing technique. The basic idea is: First find a hashing function which
puts the given d-dimensional space data into a d'-dimensional buckets
where d' << d. Then, we use existing index techniques to manage those
buckets. We later define some properties of a good hashing function and
give four hashing functions. To demonstrate the efficiency of our idea,
we experimentally compared our algorithms with sequential scan and
Pyramid-Techniques. The results demonstrate that this method outperforms
others for skewed data set. It always beats the sequential scan by using
only half of elapsed time for range query. However if the data has uniform
distribution, Pyramid-Technique is still the best method.