Show simple item record

BRIDGING THE SEMANTIC GAP : IMAGE AND VIDEO UNDERSTANDING BY EXPLOITING ATTRIBUTES

dc.contributor.advisorAloimonos, Yiannisen_US
dc.contributor.authorYu, Xiaodongen_US
dc.date.accessioned2014-02-04T06:30:57Z
dc.date.available2014-02-04T06:30:57Z
dc.date.issued2013en_US
dc.identifier.urihttp://hdl.handle.net/1903/14769
dc.description.abstractUnderstanding image and video is one of the fundamental problems in the field of computer vision. Traditionally, the research in this area focused on extracting low level features from images and videos and learning classifiers to categorize these features to pre-defined classes of objects, scenes or activities. However, it is well known that there exists a ``semantic gap'' between low level features and high level semantic concepts, which greatly obstructs the progress of research on image and video understanding. Our work departs from the traditional view of image and video understanding in that we add a middle layer between high level concepts and low level features, which is called as attribute, and use this layer to facilitate the description of concepts and detection of entities from images and videos. On one hand, attributes are relatively simple and thus can be more reliably detected from the low level features; on the other hand, we can exploit high level knowledge about the relationship between the attributes and the high level concepts and the relationship among attributes, and therefore reduce the semantic gap. Our ideas are demonstrates in three applications as follows: First, we presented an attribute-based learning approach for object recognition, where attributes are used to transfer knowledge on object properties from known classes to unknown classes and consequently reduce the number of training examples needed to learn the new object classes. Next, we illustrate an active framework to recognize scenes based on the objects therein, which are considered as the attributes of the scenes. The active framework utilizes the correlation among objects in a scene and thus significantly reduces the number of objects to be detected in order to recognize the scene. Finally, we propose a novel approach to detect the activity attributes from sports videos, where the contextual constraints are explored to decrease the ambiguity in attribute detection. The activity attributes enable us to go beyond naming the activity categories and achieve a fine-grained description of the activities in the videos.en_US
dc.language.isoenen_US
dc.titleBRIDGING THE SEMANTIC GAP : IMAGE AND VIDEO UNDERSTANDING BY EXPLOITING ATTRIBUTESen_US
dc.typeDissertationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.contributor.departmentElectrical Engineeringen_US
dc.subject.pqcontrolledElectrical engineeringen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledAttributeen_US
dc.subject.pquncontrolledImage and Video Understandingen_US
dc.subject.pquncontrolledSemantic Gapen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record