Computer Science Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2756

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    FINDING OBJECTS IN COMPLEX SCENES
    (2018) Sun, Jin; Jacobs, David; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Object detection is one of the fundamental problems in computer vision that has great practical impact. Current object detectors work well under certain con- ditions. However, challenges arise when scenes become more complex. Scenes are often cluttered and object detectors trained on Internet collected data fail when there are large variations in objects’ appearance. We believe the key to tackle those challenges is to understand the rich context of objects in scenes, which includes: the appearance variations of an object due to viewpoint and lighting condition changes; the relationships between objects and their typical environment; and the composition of multiple objects in the same scene. This dissertation aims to study the complexity of scenes from those aspects. To facilitate collecting training data with large variations, we design a novel user interface, ARLabeler, utilizing the power of Augmented Reality (AR) devices. Instead of labeling images from the Internet passively, we put an observer in the real world with full control over the scene complexities. Users walk around freely and observe objects from multiple angles. Lighting can be adjusted. Objects can be added and/or removed to the scene to create rich compositions. Our tool opens new possibilities to prepare data for complex scenes. We also study challenges in deploying object detectors in real world scenes: detecting curb ramps in street view images. A system, Tohme, is proposed to combine detection results from detectors and human crowdsourcing verifications. One core component is a meta-classifier that estimates the complexity of a scene and assigns it to human (accurate but costly) or computer (low cost but error-prone) accordingly. One of the insights from Tohme is that context is crucial in detecting objects. To understand the complex relationship between objects and their environment, we propose a standalone context model that predicts where an object can occur in an image. By combining this model with object detection, it can find regions where an object is missing. It can also be used to find out-of-context objects. To take a step beyond single object based detections, we explicitly model the geometrical relationships between groups of objects and use the layout information to represent scenes as a whole. We show that such a strategy is useful in retrieving indoor furniture scenes with natural language inputs.
  • Thumbnail Image
    Item
    Techniques for Image Retrieval: Deformation Insensitivity and Automatic Thumbnail Cropping
    (2006-08-03) Ling, Haibin; Jacobs, David W; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    We study several problems in image retrieval systems. These problems and proposed techniques are divided into three parts. Part I: This part focuses on robust object representation, which is of fundamental importance in computer vision. We target this problem without using specific object models. This allows us to develop methods that can be applied to many different problems. Three approaches are proposed that are insensitive to different kind of object or image changes. First, we propose using the inner-distance, defined as the length of shortest paths within shape boundary, to build articulation insensitive shape descriptors. Second, a deformation insensitive framework for image matching is presented, along with an insensitive descriptor based on geodesic distances on image surfaces. Third, we use a gradient orientation pyramid as a robust face image representation and apply it to the task of face verification across ages. Part II: This part concentrates on comparing histogram-based descriptors that are widely used in image retrieval. We first present an improved algorithm of the Earth Mover's Distance (EMD), which is a popular dissimilarity measure between histograms. The new algorithm is one order faster than original EMD algorithms. Then, motivated by the new algorithm, a diffusion-based distance is designed that is more straightforward and efficient. The efficiency and effectiveness of the proposed approaches are validated in experiments on both shape recognition and interest point matching tasks, using both synthetic and real data. Part III: This part studies the thumbnail generation problem that has wide application in visualization tasks. Traditionally, thumbnails are generated by shrinking the original images. These thumbnails are often illegible due to size limitation. We study the ability of computer vision systems to detect key components of images so that intelligent cropping, prior to shrinking, can render objects more recognizable. With this idea, we propose an automatic thumbnail cropping technique based on the distribution of pixel saliency in an image. The proposed approach is tested in a carefully designed user study, which shows that the cropped thumbnails are substantially more recognizable and easier to find in the context of visual search.