Show simple item record

Computational Mid-Level Vision: From Border Ownership to Categorical Object Recognition

dc.contributor.advisorAloimonos, Johnen_US
dc.contributor.authorTeo, Ching Liken_US
dc.description.abstractSince it was proposed in 1890 by Christian von Ehrenfels, Gestalt psychology has remained a key school of thought that explains how one perceives the world ("the whole'') from the sum of its individual components ("the parts'') or processes. These processes are aptly summarized in the well known "Rules of Gestalt''. In spite of its influence in other fields, the empirical nature of Gestalt rules impedes their widespread adoption in Computer Science. This thesis serves to bridge this apparent divide by making <i>Mid-level Vision</i>, or Computer Vision based on Gestalt rules, not only computationally feasible but also practical for real applications. We address the general problem of <i>figure-ground organization</i>, where the goal is to separate the foreground (or object) from the background. To do this, we first formulate a fast approach that pairs Structured Random Forests (SRFs) with Gestalt-like features, for both boundary detection and border ownership assignment. We then show how border ownership information is useful for shape-based recognition of object categories. This is done by embedding ownership information into the <i>image torque</i>, a grouping operator that detects closure patterns in the image edge, so that we modulate the operator in an efficient manner for detecting class-specific contours in clutter and occlusion. Next, we show how <i>symmetry</i>, an important shape-based regularity in Gestalt psychology, can be detected in clutter and be used for guiding segmentation of symmetric foreground regions. Besides shape and symmetry, <i>functionality</i> is another important mid-level cue that supports categorical object recognition. Based on Gibson's principle of affordance, we introduce a fast technique based on a SRF trained with geometric features that provides pixel-accurate affordances of tool parts. Finally, we describe as future work how language can be exploited to "activate'' such mid-level processes so that a joint semantic space can be obtained for linking visual concepts to language to solve even more challenging problems in Computer Vision, effectively reducing the so-called "semantic gap'' between these two related domains.en_US
dc.titleComputational Mid-Level Vision: From Border Ownership to Categorical Object Recognitionen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.contributor.departmentComputer Scienceen_US
dc.subject.pqcontrolledArtificial intelligenceen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledAffordance detectionen_US
dc.subject.pquncontrolledBorder ownershipen_US
dc.subject.pquncontrolledComputer visionen_US
dc.subject.pquncontrolledFigure-ground organizationen_US
dc.subject.pquncontrolledMid-level visionen_US

Files in this item


This item appears in the following Collection(s)

Show simple item record