FINDING OBJECTS IN COMPLEX SCENES

Sun, Jin

FINDING OBJECTS IN COMPLEX SCENES

dc.contributor.advisor	Jacobs, David	en_US
dc.contributor.author	Sun, Jin	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2018-09-07T05:42:49Z
dc.date.available	2018-09-07T05:42:49Z
dc.date.issued	2018	en_US
dc.description.abstract	Object detection is one of the fundamental problems in computer vision that has great practical impact. Current object detectors work well under certain con- ditions. However, challenges arise when scenes become more complex. Scenes are often cluttered and object detectors trained on Internet collected data fail when there are large variations in objects’ appearance. We believe the key to tackle those challenges is to understand the rich context of objects in scenes, which includes: the appearance variations of an object due to viewpoint and lighting condition changes; the relationships between objects and their typical environment; and the composition of multiple objects in the same scene. This dissertation aims to study the complexity of scenes from those aspects. To facilitate collecting training data with large variations, we design a novel user interface, ARLabeler, utilizing the power of Augmented Reality (AR) devices. Instead of labeling images from the Internet passively, we put an observer in the real world with full control over the scene complexities. Users walk around freely and observe objects from multiple angles. Lighting can be adjusted. Objects can be added and/or removed to the scene to create rich compositions. Our tool opens new possibilities to prepare data for complex scenes. We also study challenges in deploying object detectors in real world scenes: detecting curb ramps in street view images. A system, Tohme, is proposed to combine detection results from detectors and human crowdsourcing verifications. One core component is a meta-classifier that estimates the complexity of a scene and assigns it to human (accurate but costly) or computer (low cost but error-prone) accordingly. One of the insights from Tohme is that context is crucial in detecting objects. To understand the complex relationship between objects and their environment, we propose a standalone context model that predicts where an object can occur in an image. By combining this model with object detection, it can find regions where an object is missing. It can also be used to find out-of-context objects. To take a step beyond single object based detections, we explicitly model the geometrical relationships between groups of objects and use the layout information to represent scenes as a whole. We show that such a strategy is useful in retrieving indoor furniture scenes with natural language inputs.	en_US
dc.identifier	https://doi.org/10.13016/M2HQ3S260
dc.identifier.uri	http://hdl.handle.net/1903/21164
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pquncontrolled	accessibility assessment	en_US
dc.subject.pquncontrolled	augmented reality	en_US
dc.subject.pquncontrolled	computer vision	en_US
dc.subject.pquncontrolled	context	en_US
dc.subject.pquncontrolled	image retrieval	en_US
dc.subject.pquncontrolled	object detection	en_US
dc.title	FINDING OBJECTS IN COMPLEX SCENES	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sun_umd_0117E_19377.pdf
Size:: 24.88 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations