Computational Mid-Level Vision: From Border Ownership to Categorical Object Recognition

Teo, Ching Lik

Computational Mid-Level Vision: From Border Ownership to Categorical Object Recognition

dc.contributor.advisor	Aloimonos, John	en_US
dc.contributor.author	Teo, Ching Lik	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2016-02-06T06:31:03Z
dc.date.available	2016-02-06T06:31:03Z
dc.date.issued	2015	en_US
dc.description.abstract	Since it was proposed in 1890 by Christian von Ehrenfels, Gestalt psychology has remained a key school of thought that explains how one perceives the world ("the whole'') from the sum of its individual components ("the parts'') or processes. These processes are aptly summarized in the well known "Rules of Gestalt''. In spite of its influence in other fields, the empirical nature of Gestalt rules impedes their widespread adoption in Computer Science. This thesis serves to bridge this apparent divide by making <i>Mid-level Vision</i>, or Computer Vision based on Gestalt rules, not only computationally feasible but also practical for real applications. We address the general problem of <i>figure-ground organization</i>, where the goal is to separate the foreground (or object) from the background. To do this, we first formulate a fast approach that pairs Structured Random Forests (SRFs) with Gestalt-like features, for both boundary detection and border ownership assignment. We then show how border ownership information is useful for shape-based recognition of object categories. This is done by embedding ownership information into the <i>image torque</i>, a grouping operator that detects closure patterns in the image edge, so that we modulate the operator in an efficient manner for detecting class-specific contours in clutter and occlusion. Next, we show how <i>symmetry</i>, an important shape-based regularity in Gestalt psychology, can be detected in clutter and be used for guiding segmentation of symmetric foreground regions. Besides shape and symmetry, <i>functionality</i> is another important mid-level cue that supports categorical object recognition. Based on Gibson's principle of affordance, we introduce a fast technique based on a SRF trained with geometric features that provides pixel-accurate affordances of tool parts. Finally, we describe as future work how language can be exploited to "activate'' such mid-level processes so that a joint semantic space can be obtained for linking visual concepts to language to solve even more challenging problems in Computer Vision, effectively reducing the so-called "semantic gap'' between these two related domains.	en_US
dc.identifier	https://doi.org/10.13016/M2R13Z
dc.identifier.uri	http://hdl.handle.net/1903/17201
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Artificial intelligence	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pqcontrolled	Robotics	en_US
dc.subject.pquncontrolled	Affordance detection	en_US
dc.subject.pquncontrolled	Border ownership	en_US
dc.subject.pquncontrolled	Computer vision	en_US
dc.subject.pquncontrolled	Figure-ground organization	en_US
dc.subject.pquncontrolled	Mid-level vision	en_US
dc.subject.pquncontrolled	Symmetry	en_US
dc.title	Computational Mid-Level Vision: From Border Ownership to Categorical Object Recognition	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Teo_umd_0117E_16577.pdf
Size:: 97.73 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations