Computer Science Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2756

Browse

Search Results

Now showing 1 - 3 of 3
  • Thumbnail Image
    Item
    Algorithms and Data Structures for Faster Nearest-Neighbor Classification
    (2022) Flores Velazco, Alejandro; Mount, David; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Given a set P of n labeled points in a metric space (X,d), the nearest-neighbor rule classifies an unlabeled query point q ∈ X with the class of q's closest point in P. Despite the advent of more sophisticated techniques, nearest-neighbor classification is still fundamental for many machine-learning applications. Over the years, this~has motivated numerous research aiming to reduce its high dependency on the size and dimensionality of the data. This dissertation presents various approaches to reduce the dependency of the nearest-neighbor rule from n to some smaller parameter k, that describes the intrinsic complexity of the class boundaries of P. This is of particular significance as it is usually assumed that k ≪ n on real-world training sets. One natural way to achieve this dependency reduction is to reduce the training set itself, selecting a subset R ⊆ P to be used by the nearest-neighbor rule~to~answer incoming queries, instead of using P. Evidently, this approach would reduce the dependencies of the nearest-neighbor rule from n, the size of P, to the size of R. This dissertation explores different techniques to select subsets whose sizes are proportional to k, and that provide varying degrees of correct classification guarantees. Another alternative involves bypassing training set reduction, and instead building data structures designed to answer classification queries directly. To this end, this dissertation proposes the Chromatic AVD; a Quadtree-based data structure designed to answer ε-approximate nearest-neighbor classification queries. The query time and space complexities of this data structure depend on k_ε; a generalization of k that describes the intrinsic complexity of the ε-approximate class boundaries of P.
  • Thumbnail Image
    Item
    Approximate Range Searching In The Absolute Error Model
    (2007-11-28) Fonseca, Guilherme Dias da; Mount, David M; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Range searching is a well known problem in computational geometry. We consider this problem in the context of approximation, where an approximation parameter eps > 0 is provided. Most prior work on this problem has focused on the relative error model, where each range shape R is bounded, and points within distance eps diam(R) of the range's boundary may or may not be included. We introduce a different approximation model, called the absolute error model, in which points within distance eps of the range's boundary may or may not be included, regardless of the diameter of the range. We consider sets of ranges consisting of general convex bodies, axis-aligned rectangles, halfspaces, Euclidean balls, and simplices. We examine a variety of problem formulations, including range searching under general commutative semigroups, idempotent semigroups, groups, range emptiness, and range reporting. We apply our data structures to several related problems, including range sketching, approximate nearest neighbor searching, exact idempotent range searching, approximate range searching in the data stream model, and approximate range searching in the relative model.
  • Thumbnail Image
    Item
    Spatial Modeling using Triangular, Tetrahedral, and Pentatopic Decompositions
    (2006-04-28) Lee, Michael Thomas; Samet, Hanan; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Techniques are described for facilitating operations for spatial modeling using triangular, tetrahedral, and pentatopic decompositions of the underlying domain. In the case of terrain data, techniques are presented for navigating between adjacent triangles of a hierarchical triangle mesh where the triangles are obtained by a recursive quadtree-like subdivision of the underlying space into four equilateral triangles. We describe a labeling technique for the triangles which is useful in implementing the quadtree triangle mesh as a linear quadtree (i.e., a pointer-less quadtree). The navigation can then take place in this linear quadtree. This results in algorithms that have a worst-case constant time complexity, as they make use of a fixed number of bit manipulation operations. In the case of volumetric data, we consider a multi-resolution representation based on a decomposition of a field domain into nested tetrahedral cells generated by recursive tetrahedron bisection, that we call a Hierarchy of Tetrahedra (HT). We describe our implementation of an HT, and discuss how to extract conforming meshes from an HT so as to avoid discontinuities in the approximation of the associated scalar field. This is accomplished by using worst-case constant time neighbor finding algorithms. We also present experimental results in connection with a set of basic queries for performing analysis of volume data sets at different levels of detail. In the case of four-dimensional data which can include time as the fourth dimension, we present a multi-resolution representation of a four-dimensional scalar field based on a recursive decomposition of a hypercubic domain into a hierarchy of nested four-dimensional simplexes, that we call a Hierarchy of Pentatopes (HP). This structure allows us to generate conforming meshes that avoid discontinuities in the corresponding approximation of the associated scalar field. Neighbor finding is an important part of this process and using our structure, it is possible to find neighbors in worst-case constant time by using bit manipulation operations, thereby avoiding traversing the hierarchy.