UMD Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/3

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a given thesis/dissertation in DRUM.

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 4 of 4
  • Thumbnail Image
    Item
    Towards segmentation into surfaces
    (2010) Bitsakos, Konstantinos; Aloimonos, Yiannis; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Image segmentation is a fundamental problem of low level computer vision and is also used as a preprocessing step for a number of higher level tasks (e.g. object detection and recognition, action classification, optical flow and stereo computation etc). In this dissertation we study the image segmentation problem focusing on the task of segmentation into surfaces. First we present our unifying framework through which mean shift, bilateral filtering and anisotropic diffusion can be described. Three new methods are also described and implemented and the most prominent of them, called Color Mean Shift (CMS), is extensively tested and compared against the existing methods. We experimentally show that CMS outperforms the other methods i.e., creates more uniform regions and retains equally well the edges between segments. Next we argue that color based segmentation should be a two stage process; edge preserving filtering, followed by pixel clustering. We create novel segmentation algorithms by coupling the previously described filtering methods with standard grouping techniques. We compare all the segmentation methods with current state of the art grouping methods and show that they produce better results on the Berkeley and Weizmann segmentation datasets. A number of other interesting conclusions are also drawn from the comparison. Then we focus on surface normal estimation techniques. We present two novel methods to estimate the parameters of a planar surface viewed by a moving robot when the odometry is known. We also present a way of combining them and integrate the measurements over time using an extended Kalman filter. We test the estimation accuracy by demonstrating the ability of the system to navigate in an indoor environment using exclusively vision. We conclude this dissertation with a discussion on how color based segmentation can be integrated into a structure from motion framework that computes planar surfaces using homographies.
  • Thumbnail Image
    Item
    Appearance modeling under geometric context for object recognition in videos
    (2006-08-03) Li, Jian; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Object recognition is a very important high-level task in surveillance applications. This dissertation focuses on building appearance models for object recognition and exploring the relationship between shape and appearance for two key types of objects, human and vehicle. The dissertation proposes a generic framework that models the appearance while incorporating certain geometric prior information, or the so-called geometric context. Then under this framework, special methods are developed for recognizing humans and vehicles based on their appearance and shape attributes in surveillance videos. The first part of the dissertation presents a unified framework based on a general definition of geometric transform (GeT) which is applied to modeling object appearances under geometric context. The GeT models the appearance by applying designed functionals over certain geometric sets. GeT unifies Radon transform, trace transform, image warping etc. Moreover, five novel types of GeTs are introduced and applied to fingerprinting the appearance inside a contour. They include GeT based on level sets, GeT based on shape matching, GeT based on feature curves, GeT invariant to occlusion, and a multi-resolution GeT (MRGeT) that combines both shape and appearance information. The second part focuses on how to use the GeT to build appearance models for objects like walking humans, which have articulated motion of body parts. This part also illustrates the application of GeT for object recognition, image segmentation, video retrieval, and image synthesis. The proposed approach produces promising results when applied to automatic body part segmentation and fingerprinting the appearance of a human and body parts despite the presence of non-rigid deformations and articulated motion. It is very important to understand the 3D structure of vehicles in order to recognize them. To reconstruct the 3D model of a vehicle, the third part presents a factorization method for structure from planar motion. Experimental results show that the algorithm is accurate and fairly robust to noise and inaccurate calibration. Differences and the dual relationship between planar motion and planar object are also clarified in this part. Based on our method, a fully automated vehicle reconstruction system has been designed.
  • Thumbnail Image
    Item
    A holistic approach to structure from motion
    (2006-07-23) ji, hui; Aloimonos, Yiannis; Fermuller, Cornelia; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This dissertation investigates the general structure from motion problem. That is, how to compute in an unconstrained environment 3D scene structure, camera motion and moving objects from video sequences. We present a framework which uses concatenated feed-back loops to overcome the main difficulty in the structure from motion problem: the chicken-and-egg dilemma between scene segmentation and structure recovery. The idea is that we compute structure and motion in stages by gradually computing 3D scene information of increasing complexity and using processes which operate on increasingly large spatial image areas. Within this framework, we developed three modules. First, we introduce a new constraint for the estimation of shape using image features from multiple views. We analyze this constraint and show that noise leads to unavoidable mis-estimation of the shape, which also predicts the erroneous shape perception in human. This insight provides a clear argument for the need for feed-back loops. Second, a novel constraint on shape is developed which allows us to connect multiple frames in the estimation of camera motion by matching only small image patches. Third, we present a texture descriptor for matching areas of extended sizes. The advantage of this texture descriptor, which is based on fractal geometry, lies in its invariance to any smooth mapping (Bi-Lipschitz transform) including changes of viewpoint, illumination and surface distortion. Finally, we apply our framework to the problem of super-resolution imaging. We use the 3D motion estimation together with a novel wavelet-based reconstruction scheme to reconstruct a high-resolution image from a sequence of low-resolution images.
  • Thumbnail Image
    Item
    Structure from Motion on Textures: Theory and Application to Calibration
    (2005-04-04) Baker, Patrick Terry; Aloimonos, Yiannis J; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This dissertation introduces new mathematical constraints that enable us, for the first time, to investigate the correspondence problem using texture rather than point and lines. These three multilinear constraints are formulated on parallel equidistant lines embedded in a plane. We choose these sets of parallel lines as proxies for fourier harmonics embedded on a plane as a sort of ``texture atom''. From these texture atoms we can build up arbitrarily textured surfaces in the world. If we decompose these textures in a Fourier sense rather than as points and lines, we use these new constraints rather than the standard multifocal constraints such as the epipolar or trifocal. We propose some mechanisms for a possible feedback solution to the correspondence problem. As the major application of these constraints, we describe a multicamera calibration system written in C and MATLAB which will be made available to the public. We describe the operation of the program and give some preliminary results.