Context Models for Understanding Images and Videos

dc.contributor.advisorDavis, Larry Sen_US
dc.contributor.authorNagaraja, Varun K.en_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2016-09-03T05:42:49Z
dc.date.available2016-09-03T05:42:49Z
dc.date.issued2016en_US
dc.description.abstractA computer vision system that has to interact in natural language needs to understand the visual appearance of interactions between objects along with the appearance of objects themselves. Relationships between objects are frequently mentioned in queries of tasks like semantic image retrieval, image captioning, visual question answering and natural language object detection. Hence, it is essential to model context between objects for solving these tasks. In the first part of this thesis, we present a technique for detecting an object mentioned in a natural language query. Specifically, we work with referring expressions which are sentences that identify a particular object instance in an image. In many referring expressions, an object is described in relation to another object using prepositions, comparative adjectives, action verbs etc. Our proposed technique can identify both the referred object and the context object mentioned in such expressions. Context is also useful for incrementally understanding scenes and videos. In the second part of this thesis, we propose techniques for searching for objects in an image and events in a video. Our proposed incremental algorithms use the context from previously explored regions to prioritize the regions to explore next. The advantage of incremental understanding is restricting the amount of computation time and/or resources spent for various detection tasks. Our first proposed technique shows how to learn context in indoor scenes in an implicit manner and use it for searching for objects. The second technique shows how explicitly written context rules of one-on-one basketball can be used to sequentially detect events in a game.en_US
dc.identifierhttps://doi.org/10.13016/M29V36
dc.identifier.urihttp://hdl.handle.net/1903/18603
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.titleContext Models for Understanding Images and Videosen_US
dc.typeDissertationen_US

Files