Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • Login
    View Item 
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Computational Mid-Level Vision: From Border Ownership to Categorical Object Recognition

    Thumbnail
    View/Open
    Teo_umd_0117E_16577.pdf (97.72Mb)
    No. of downloads: 165

    Date
    2015
    Author
    Teo, Ching Lik
    Advisor
    Aloimonos, John
    DRUM DOI
    https://doi.org/10.13016/M2R13Z
    Metadata
    Show full item record
    Abstract
    Since it was proposed in 1890 by Christian von Ehrenfels, Gestalt psychology has remained a key school of thought that explains how one perceives the world ("the whole'') from the sum of its individual components ("the parts'') or processes. These processes are aptly summarized in the well known "Rules of Gestalt''. In spite of its influence in other fields, the empirical nature of Gestalt rules impedes their widespread adoption in Computer Science. This thesis serves to bridge this apparent divide by making <i>Mid-level Vision</i>, or Computer Vision based on Gestalt rules, not only computationally feasible but also practical for real applications. We address the general problem of <i>figure-ground organization</i>, where the goal is to separate the foreground (or object) from the background. To do this, we first formulate a fast approach that pairs Structured Random Forests (SRFs) with Gestalt-like features, for both boundary detection and border ownership assignment. We then show how border ownership information is useful for shape-based recognition of object categories. This is done by embedding ownership information into the <i>image torque</i>, a grouping operator that detects closure patterns in the image edge, so that we modulate the operator in an efficient manner for detecting class-specific contours in clutter and occlusion. Next, we show how <i>symmetry</i>, an important shape-based regularity in Gestalt psychology, can be detected in clutter and be used for guiding segmentation of symmetric foreground regions. Besides shape and symmetry, <i>functionality</i> is another important mid-level cue that supports categorical object recognition. Based on Gibson's principle of affordance, we introduce a fast technique based on a SRF trained with geometric features that provides pixel-accurate affordances of tool parts. Finally, we describe as future work how language can be exploited to "activate'' such mid-level processes so that a joint semantic space can be obtained for linking visual concepts to language to solve even more challenging problems in Computer Vision, effectively reducing the so-called "semantic gap'' between these two related domains.
    URI
    http://hdl.handle.net/1903/17201
    Collections
    • Computer Science Theses and Dissertations
    • UMD Theses and Dissertations

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    Browse

    All of DRUMCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility