Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 5 of 5
  • Thumbnail Image
    Item
    Efficient Image Segmentation and Segment-Based Analysis in Computer Vision Applications
    (2015) Soares, Joao V. B.; Jacobs, David W; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This dissertation focuses on efficient image segmentation and segment-based object recognition in computer vision applications. Special attention is devoted to analyzing shape, of particular importance for our two applications: plant species identification from leaf photos, and object classification in remote sensing images. Additionally, both problems are bound by efficiency, constraining the choice of applicable methods: leaf recognition results are to be used within an interactive system, while remote sensing image analysis must scale well over very large image sets. Leafsnap was the first mobile app to provide automatic recognition of tree species, currently counting with over 1.7 million downloads. We present an overview of the mobile app and corresponding back end recognition system, as well as a preliminary analysis of user-submitted data. More than 1.7 million valid leaf photos have been uploaded by users, 1.3 million of which are GPS-tagged. We then focus on the problem of segmenting photos of leaves taken against plain light-colored backgrounds. These types of photos are used in practice within Leafsnap for tree species recognition. A good segmentation is essential in order to make use of the distinctive shape of leaves for recognition. We present a comparative experimental evaluation of several segmentation methods, including quantitative and qualitative results. We then introduce a custom-tailored leaf segmentation method that shows superior performance while maintaining computational efficiency. The other contribution of this work is a set of attributes for analysis of image segments. The set of attributes is designed for use in knowledge-based systems, so they are selected to be intuitive and easily describable. The attributes can also be computed efficiently, to allow applicability across different problems. We experiment with several descriptive measures from the literature and encounter certain limitations, leading us to introduce new attribute formulations and more efficient computational methods. Finally, we experiment with the attribute set on our two applications: plant species identification from leaf photos and object recognition in remote sensing images.
  • Thumbnail Image
    Item
    Learning Visual Patterns: Imposing Order on Objects, Trajectories and Networks
    (2011) Farrell, Ryan; Davis, Larry S; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Fundamental to many tasks in the field of computer vision, this work considers the understanding of observed visual patterns in static images and dynamic scenes . Within this broad domain, we focus on three particular subtasks, contributing novel solutions to: (a) the subordinate categorization of objects (avian species specifically), (b) the analysis of multi-agent interactions using the agent trajectories, and (c) the estimation of camera network topology. In contrast to object recognition, where the presence or absence of certain parts is generally indicative of basic-level category, the problem of subordinate categorization rests on the ability to establish salient distinctions amongst the characteristics of those parts which comprise the basic-level category. Focusing on an avian domain due to the fine-grained structure of the category taxonomy, we explore a pose-normalized appearance model based on a volumetric poselet scheme. The variation in shape and appearance properties of these parts across a taxonomy provides the cues needed for subordinate categorization. Our model associates the underlying image pattern parameters used for detection with corresponding volumetric part location, scale and orientation parameters. These parameters implicitly define a mapping from the image pixels into a pose-normalized appearance space, removing view and pose dependencies, facilitating fine-grained categorization with relatively few training examples. We next examine the problem of leveraging trajectories to understand interactions in dynamic multi-agent environments. We focus on perceptual tasks, those for which an agent's behavior is governed largely by the individuals and objects around them. We introduce kinetic accessibility, a model for evaluating the perceived, and thus anticipated, movements of other agents. This new model is then applied to the analysis of basketball footage. The kinetic accessibility measures are coupled with low-level visual cues and domain-specific knowledge for determining which player has possession of the ball and for recognizing events such as passes, shots and turnovers. Finally, we present two differing approaches for estimating camera network topology. The first technique seeks to partition a set of observations made in the camera network into individual object trajectories. As exhaustive consideration of the partition space is intractable, partitions are considered incrementally, adding observations while pruning unlikely partitions. Partition likelihood is determined by the evaluation of a probabilistic graphical model, balancing the consistency of appearances across a hypothesized trajectory with the latest predictions of camera adjacency. A primarily benefit of estimating object trajectories is that higher-order statistics, as opposed to just first-order adjacency, can be derived, yielding resilience to camera failure and the potential for improved tracking performance between cameras. Unlike the former centralized technique, the latter takes a decentralized approach, estimating the global network topology with local computations using sequential Bayesian estimation on a modified multinomial distribution. Key to this method is an information-theoretic appearance model for observation weighting. The inherently distributed nature of the approach allows the simultaneous utilization of all sensors as processing agents in collectively recovering the network topology.
  • Thumbnail Image
    Item
    Techniques for Image Retrieval: Deformation Insensitivity and Automatic Thumbnail Cropping
    (2006-08-03) Ling, Haibin; Jacobs, David W; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    We study several problems in image retrieval systems. These problems and proposed techniques are divided into three parts. Part I: This part focuses on robust object representation, which is of fundamental importance in computer vision. We target this problem without using specific object models. This allows us to develop methods that can be applied to many different problems. Three approaches are proposed that are insensitive to different kind of object or image changes. First, we propose using the inner-distance, defined as the length of shortest paths within shape boundary, to build articulation insensitive shape descriptors. Second, a deformation insensitive framework for image matching is presented, along with an insensitive descriptor based on geodesic distances on image surfaces. Third, we use a gradient orientation pyramid as a robust face image representation and apply it to the task of face verification across ages. Part II: This part concentrates on comparing histogram-based descriptors that are widely used in image retrieval. We first present an improved algorithm of the Earth Mover's Distance (EMD), which is a popular dissimilarity measure between histograms. The new algorithm is one order faster than original EMD algorithms. Then, motivated by the new algorithm, a diffusion-based distance is designed that is more straightforward and efficient. The efficiency and effectiveness of the proposed approaches are validated in experiments on both shape recognition and interest point matching tasks, using both synthetic and real data. Part III: This part studies the thumbnail generation problem that has wide application in visualization tasks. Traditionally, thumbnails are generated by shrinking the original images. These thumbnails are often illegible due to size limitation. We study the ability of computer vision systems to detect key components of images so that intelligent cropping, prior to shrinking, can render objects more recognizable. With this idea, we propose an automatic thumbnail cropping technique based on the distribution of pixel saliency in an image. The proposed approach is tested in a carefully designed user study, which shows that the cropped thumbnails are substantially more recognizable and easier to find in the context of visual search.
  • Thumbnail Image
    Item
    Appearance modeling under geometric context for object recognition in videos
    (2006-08-03) Li, Jian; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Object recognition is a very important high-level task in surveillance applications. This dissertation focuses on building appearance models for object recognition and exploring the relationship between shape and appearance for two key types of objects, human and vehicle. The dissertation proposes a generic framework that models the appearance while incorporating certain geometric prior information, or the so-called geometric context. Then under this framework, special methods are developed for recognizing humans and vehicles based on their appearance and shape attributes in surveillance videos. The first part of the dissertation presents a unified framework based on a general definition of geometric transform (GeT) which is applied to modeling object appearances under geometric context. The GeT models the appearance by applying designed functionals over certain geometric sets. GeT unifies Radon transform, trace transform, image warping etc. Moreover, five novel types of GeTs are introduced and applied to fingerprinting the appearance inside a contour. They include GeT based on level sets, GeT based on shape matching, GeT based on feature curves, GeT invariant to occlusion, and a multi-resolution GeT (MRGeT) that combines both shape and appearance information. The second part focuses on how to use the GeT to build appearance models for objects like walking humans, which have articulated motion of body parts. This part also illustrates the application of GeT for object recognition, image segmentation, video retrieval, and image synthesis. The proposed approach produces promising results when applied to automatic body part segmentation and fingerprinting the appearance of a human and body parts despite the presence of non-rigid deformations and articulated motion. It is very important to understand the 3D structure of vehicles in order to recognize them. To reconstruct the 3D model of a vehicle, the third part presents a factorization method for structure from planar motion. Experimental results show that the algorithm is accurate and fairly robust to noise and inaccurate calibration. Differences and the dual relationship between planar motion and planar object are also clarified in this part. Based on our method, a fully automated vehicle reconstruction system has been designed.
  • Thumbnail Image
    Item
    Deterministic Annealing for Correspondence, Pose, and Recognition
    (2006-04-27) David, Philip John; DeMenthon, Daniel; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The problem of determining the pose - the position and orientation - of an object given a model and an image of that object is a fundamental problem in computer vision. Applications include object recognition, object tracking, site inspection and updating, and autonomous navigation when scene models are available. The pose of an object is readily determined given a few correspondences between features in the image and features in the model. Conversely, corresponding model and image features can easily be determined if the pose of the object is known. However, when neither the pose nor the correspondences are known, the problem of determining either is difficult due to the fact that a small change in an object's pose can result in a large change in its appearance. Most existing techniques approach this as a combinatorial optimization problem in which the space of model-to-image feature correspondence is searched in order to find object poses that are supported by large numbers of image features. These approaches, however, are only practical when the level of clutter and occlusion in the image is small, which is often not the case in real-world environments. This dissertation presents new algorithms that simultaneously determine the pose and feature correspondences of 2D and 3D objects from images containing large amounts of clutter and occlusion. Objects are modeled as sets of 2D or 3D points or line segments, and image features consist of either points or line segments. In each of the algorithms presented, deterministic annealing is used to convert a discrete combinatorial optimization problem into a continuous one that is indexed by a control parameter. This has two advantages. First, it allows solutions to the simpler continuous problem to slowly transform into a solution to the discrete problem. Secondly, many local minima are avoided by minimizing an objective function that is highly smoothed during the early phases of the optimization but which gradually transforms into the original objective function and constraints at the end of the optimization. These algorithms perform well in experiments involving highly cluttered synthetic and real imagery.