Show simple item record

dc.contributor.advisorJacobs, Daviden_US
dc.contributor.authorSun, Jinen_US
dc.date.accessioned2018-09-07T05:42:49Z
dc.date.available2018-09-07T05:42:49Z
dc.date.issued2018en_US
dc.identifierhttps://doi.org/10.13016/M2HQ3S260
dc.identifier.urihttp://hdl.handle.net/1903/21164
dc.description.abstractObject detection is one of the fundamental problems in computer vision that has great practical impact. Current object detectors work well under certain con- ditions. However, challenges arise when scenes become more complex. Scenes are often cluttered and object detectors trained on Internet collected data fail when there are large variations in objects’ appearance. We believe the key to tackle those challenges is to understand the rich context of objects in scenes, which includes: the appearance variations of an object due to viewpoint and lighting condition changes; the relationships between objects and their typical environment; and the composition of multiple objects in the same scene. This dissertation aims to study the complexity of scenes from those aspects. To facilitate collecting training data with large variations, we design a novel user interface, ARLabeler, utilizing the power of Augmented Reality (AR) devices. Instead of labeling images from the Internet passively, we put an observer in the real world with full control over the scene complexities. Users walk around freely and observe objects from multiple angles. Lighting can be adjusted. Objects can be added and/or removed to the scene to create rich compositions. Our tool opens new possibilities to prepare data for complex scenes. We also study challenges in deploying object detectors in real world scenes: detecting curb ramps in street view images. A system, Tohme, is proposed to combine detection results from detectors and human crowdsourcing verifications. One core component is a meta-classifier that estimates the complexity of a scene and assigns it to human (accurate but costly) or computer (low cost but error-prone) accordingly. One of the insights from Tohme is that context is crucial in detecting objects. To understand the complex relationship between objects and their environment, we propose a standalone context model that predicts where an object can occur in an image. By combining this model with object detection, it can find regions where an object is missing. It can also be used to find out-of-context objects. To take a step beyond single object based detections, we explicitly model the geometrical relationships between groups of objects and use the layout information to represent scenes as a whole. We show that such a strategy is useful in retrieving indoor furniture scenes with natural language inputs.en_US
dc.language.isoenen_US
dc.titleFINDING OBJECTS IN COMPLEX SCENESen_US
dc.typeDissertationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.contributor.departmentComputer Scienceen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledaccessibility assessmenten_US
dc.subject.pquncontrolledaugmented realityen_US
dc.subject.pquncontrolledcomputer visionen_US
dc.subject.pquncontrolledcontexten_US
dc.subject.pquncontrolledimage retrievalen_US
dc.subject.pquncontrolledobject detectionen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record