Hashing Technique: A New Index Method for High Dimensional Data

Song, Zhexuan; Roussopoulos, Nick

Hashing Technique: A New Index Method for High Dimensional Data

Files

CS-TR-4059.ps (197.6 KB)

No. of downloads: 463

CS-TR-4059.pdf (195.22 KB)

No. of downloads: 651

Date

1999-09-25

Authors

Song, Zhexuan

Roussopoulos, Nick

Abstract

When dimension goes high, sequential scan processing becomes more efficient than most index-based query. In this paper, we propose a new index method for high-dimensional data spaces. This method is based on hashing technique. The basic idea is: First find a hashing function which puts the given d-dimensional space data into a d'-dimensional buckets where d' << d. Then, we use existing index techniques to manage those buckets. We later define some properties of a good hashing function and give four hashing functions. To demonstrate the efficiency of our idea, we experimentally compared our algorithms with sequential scan and Pyramid-Techniques. The results demonstrate that this method outperforms others for skewed data set. It always beats the sequential scan by using only half of elapsed time for range query. However if the data has uniform distribution, Pyramid-Technique is still the best method.

URI (handle)

http://hdl.handle.net/1903/509

Collections

Technical Reports of the Computer Science Department

Full item page