Algorithms and Data Structures for Faster Nearest-Neighbor Classification

Flores Velazco, Alejandro

Algorithms and Data Structures for Faster Nearest-Neighbor Classification

dc.contributor.advisor	Mount, David	en_US
dc.contributor.author	Flores Velazco, Alejandro	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2022-09-20T05:34:57Z
dc.date.available	2022-09-20T05:34:57Z
dc.date.issued	2022	en_US
dc.description.abstract	Given a set P of n labeled points in a metric space (X,d), the nearest-neighbor rule classifies an unlabeled query point q ∈ X with the class of q's closest point in P. Despite the advent of more sophisticated techniques, nearest-neighbor classification is still fundamental for many machine-learning applications. Over the years, this~has motivated numerous research aiming to reduce its high dependency on the size and dimensionality of the data. This dissertation presents various approaches to reduce the dependency of the nearest-neighbor rule from n to some smaller parameter k, that describes the intrinsic complexity of the class boundaries of P. This is of particular significance as it is usually assumed that k ≪ n on real-world training sets. One natural way to achieve this dependency reduction is to reduce the training set itself, selecting a subset R ⊆ P to be used by the nearest-neighbor rule~to~answer incoming queries, instead of using P. Evidently, this approach would reduce the dependencies of the nearest-neighbor rule from n, the size of P, to the size of R. This dissertation explores different techniques to select subsets whose sizes are proportional to k, and that provide varying degrees of correct classification guarantees. Another alternative involves bypassing training set reduction, and instead building data structures designed to answer classification queries directly. To this end, this dissertation proposes the Chromatic AVD; a Quadtree-based data structure designed to answer ε-approximate nearest-neighbor classification queries. The query time and space complexities of this data structure depend on k_ε; a generalization of k that describes the intrinsic complexity of the ε-approximate class boundaries of P.	en_US
dc.identifier	https://doi.org/10.13016/vnvd-gplg
dc.identifier.uri	http://hdl.handle.net/1903/29224
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Information science	en_US
dc.subject.pquncontrolled	algorithms	en_US
dc.subject.pquncontrolled	classification	en_US
dc.subject.pquncontrolled	data structures	en_US
dc.subject.pquncontrolled	nearest-neighbor rule	en_US
dc.title	Algorithms and Data Structures for Faster Nearest-Neighbor Classification	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: FloresVelazco_umd_0117E_22588.pdf
Size:: 6.34 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations