On the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensions

dc.contributor.authorManeewongvatana, Songriten_US
dc.contributor.authorMount, David M.en_US
dc.date.accessioned2004-05-31T21:09:19Z
dc.date.available2004-05-31T21:09:19Z
dc.date.created2001-02en_US
dc.date.issued2001-05-10en_US
dc.description.abstractIn nearest neighbor searching we are given a set of n data points in real d-dimensional space, R^d, and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported efficiently. Because data sets can be quite large, we are interested in data structures that use optimal O(dn) storage. Given the limitation of linear storage, the best known data structures suffer from expected-case query times that grow exponentially in d. However, it is widely regarded in practice that data sets in high dimensional spaces tend to consist of clusters residing in much lower dimensional subspaces. This raises the question of whether data structures for nearest neighbor searching adapt to the presence of lower dimensional clustering, and further how performance varies when the clusters are aligned with the coordinate axes. We analyze the popular kd-tree data structure in the form of two variants based on a modification of the splitting method, which produces cells satisfy the basic packing properties needed for efficiency without producing empty cells. We show that when data points are uniformly distributed on a k-dimensional hyperplane for k <= d, then expected number of leaves visited in such a kd-tree grows exponentially in k, but not in d. We show that the growth rate is even smaller still if the hyperplane is aligned with the coordinate axes. We present empirical studies to support our theoretical results.en_US
dc.format.extent2301672 bytes
dc.format.mimetypeapplication/postscript
dc.identifier.urihttp://hdl.handle.net/1903/520
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtComputer Science Department Technical Reportsen_US
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-4209en_US
dc.titleOn the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensionsen_US
dc.typeTechnical Reporten_US

Files

Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
CS-TR-4209.ps
Size:
2.2 MB
Format:
Postscript Files
Loading...
Thumbnail Image
Name:
CS-TR-4209.pdf
Size:
311.75 KB
Format:
Adobe Portable Document Format
Description:
Auto-generated copy of CS-TR-4209.ps