Electrical & Computer Engineering Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2765

Browse

Search Results

Now showing 1 - 6 of 6
  • Item
    Video Processing with Additional Information
    (2010) Ramachandran, Mahesh; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Cameras are frequently deployed along with many additional sensors in aerial and ground-based platforms. Many video datasets have metadata containing measurements from inertial sensors, GPS units, etc. Hence the development of better video processing algorithms using additional information attains special significance. We first describe an intensity-based algorithm for stabilizing low resolution and low quality aerial videos. The primary contribution is the idea of minimizing the discrepancy in the intensity of selected pixels between two images. This is an application of inverse compositional alignment for registering images of low resolution and low quality, for which minimizing the intensity difference over salient pixels with high gradients results in faster and better convergence than when using all the pixels. Secondly, we describe a feature-based method for stabilization of aerial videos and segmentation of small moving objects. We use the coherency of background motion to jointly track features through the sequence. This enables accurate tracking of large numbers of features in the presence of repetitive texture, lack of well conditioned feature windows etc. We incorporate the segmentation problem within the joint feature tracking framework and propose the first combined joint-tracking and segmentation algorithm. The proposed approach enables highly accurate tracking, and segmentation of feature tracks that is used in a MAP-MRF framework for obtaining dense pixelwise labeling of the scene. We demonstrate competitive moving object detection in challenging video sequences of the VIVID dataset containing moving vehicles and humans that are small enough to cause background subtraction approaches to fail. Structure from Motion (SfM) has matured to a stage, where the emphasis is on developing fast, scalable and robust algorithms for large reconstruction problems. The availability of additional sensors such as inertial units and GPS along with video cameras motivate the development of SfM algorithms that leverage these additional measurements. In the third part, we study the benefits of the availability of a specific form of additional information - the vertical direction (gravity) and the height of the camera both of which can be conveniently measured using inertial sensors, and a monocular video sequence for 3D urban modeling. We show that in the presence of this information, the SfM equations can be rewritten in a bilinear form. This allows us to derive a fast, robust, and scalable SfM algorithm for large scale applications. The proposed SfM algorithm is experimentally demonstrated to have favorable properties compared to the sparse bundle adjustment algorithm. We provide experimental evidence indicating that the proposed algorithm converges in many cases to solutions with lower error than state-of-art implementations of bundle adjustment. We also demonstrate that for the case of large reconstruction problems, the proposed algorithm takes lesser time to reach its solution compared to bundle adjustment. We also present SfM results using our algorithm on the Google StreetView research dataset, and several other datasets.
  • Item
    Robust and Efficient Inference of Scene and Object Motion in Multi-Camera Systems
    (2009) Sankaranarayanan, Aswin C; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Multi-camera systems have the ability to overcome some of the fundamental limitations of single camera based systems. Having multiple view points of a scene goes a long way in limiting the influence of field of view, occlusion, blur and poor resolution of an individual camera. This dissertation addresses robust and efficient inference of object motion and scene in multi-camera and multi-sensor systems. The first part of the dissertation discusses the role of constraints introduced by projective imaging towards robust inference of multi-camera/sensor based object motion. We discuss the role of the homography and epipolar constraints for fusing object motion perceived by individual cameras. For planar scenes, the homography constraints provide a natural mechanism for data association. For scenes that are not planar, the epipolar constraint provides a weaker multi-view relationship. We use the epipolar constraint for tracking in multi-camera and multi-sensor networks. In particular, we show that the epipolar constraint reduces the dimensionality of the state space of the problem by introducing a ``shared'' state space for the joint tracking problem. This allows for robust tracking even when one of the sensors fail due to poor SNR or occlusion. The second part of the dissertation deals with challenges in the computational aspects of tracking algorithms that are common to such systems. Much of the inference in the multi-camera and multi-sensor networks deal with complex non-linear models corrupted with non-Gaussian noise. Particle filters provide approximate Bayesian inference in such settings. We analyze the computational drawbacks of traditional particle filtering algorithms, and present a method for implementing the particle filter using the Independent Metropolis Hastings sampler, that is highly amenable to pipelined implementations and parallelization. We analyze the implementations of the proposed algorithm, and in particular concentrate on implementations that have minimum processing times. The last part of the dissertation deals with the efficient sensing paradigm of compressing sensing (CS) applied to signals in imaging, such as natural images and reflectance fields. We propose a hybrid signal model on the assumption that most real-world signals exhibit subspace compressibility as well as sparse representations. We show that several real-world visual signals such as images, reflectance fields, videos etc., are better approximated by this hybrid of two models. We derive optimal hybrid linear projections of the signal and show that theoretical guarantees and algorithms designed for CS can be easily extended to hybrid subspace-compressive sensing. Such methods reduce the amount of information sensed by a camera, and help in reducing the so called data deluge problem in large multi-camera systems.
  • Item
    SIMULTANEOUS MULTI-VIEW FACE TRACKING AND RECOGNITION IN VIDEO USING PARTICLE FILTERING
    (2009) Seo, Naotoshi; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Recently, face recognition based on video has gained wide interest especially due to its role in surveillance systems. Video-based recognition has superior advantages over image-based recognition because a video contains image sequences as well as temporal information. However, surveillance videos are generally of low-resolution and contain faces mostly in non-frontal poses. We propose a multi-view, video-based face recognition algorithm using the Bayesian inference framework. This method represents an appearance of each subject by a complex nonlinear appearance manifold expressed as a collection of simpler pose manifolds and the connections, represented by transition probabilities, among them. A Bayesian inference formulation is introduced to utilize the temporal information in the video via the transition probabilities among pose manifolds. The Bayesian inference formulation realizes video-based face recognition by progressively accumulating the recognition confidences in frames. The accumulation step possibly enables to solve face recognition problems in low-resolution videos, and the progressive characteristic is especially useful for a real-time processing. Furthermore, this face recognition framework has another characteristic that does not require processing all frames in a video if enough recognition confidence is accumulated in an intermediate frame. This characteristic gives an advantage over batch methods in terms of a computational efficiency. Furthermore, we propose a simultaneous multi-view face tracking and recognition algorithm. Conventionally, face recognition in a video is performed in tracking-then-recognition scenario that extracts the best facial image patch in the tracking and then recognizes the identity of the facial image. Simultaneous face tracking and recognition works in a different fashion, by handling both tracking and recognition simultaneously. Particle filter is a technique for implementing a Bayesian inference filter by Monte Carlo simulation, which has gained prevalence in the visual tracking literature since the Condensation algorithm was introduced. Since we have proposed a video-based face recognition algorithm based on the Bayesian inference framework, it is easy to integrate the particle filter tracker and our proposed recognition method into one, using the particle filter for both tracking and recognition simultaneously. This simultaneous framework utilizes the temporal information in a video for not only tracking but also recognition by modeling the dynamics of facial poses. Although the time series formulation remains more general, only the facial pose dynamics is utilized for recognition in this thesis.
  • Item
    Visual Tracking of Human Hand and Head Movements and Its Applications
    (2007-04-26) Sepehri, Afshin; Chellappa, Rama; Davis, Larry; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Tracking of human body movements is an important problem in computer vision with applications in visual surveillance and human-computer interaction. Tracking of a single hand moving in space is addressed and a set of applications in human computer interaction are presented. In this approach, a disparity map and motion fields extracted from a stereo camera set are modelled using a robust estimation method. Then, the absolute position and orientation of the hand in space are estimated and the central region of the hand is tracked over time. Virtual drawing in space, a virtual marble game, and 3D object construction are shown as the applications of the single hand tracking. Algorithms are presented for tracking the hands and head of a person or several interacting people viewed by a set of cameras in 3D. The problem is first defined as a general multiple object tracking problem in a multiple sensor environment and a two layered solution is proposed. The proposed solution includes a low level particle filtering layer to track individual targets in parallel, and a finite state machine to analyze the interactions between the targets and apply application specific heuristics. A set of activity recognition experiments in visual surveillance show the usefulness of the system. The recognized activities involve interactions between the hands and head of people and objects. A color analysis scheme and a technique for combining information from different cameras are presented. They are used to detect carried objects and exchanges between the hands.
  • Item
    SYMMETRY IN HUMAN MOTION ANALYSIS: THEORY AND EXPERIMENTS
    (2006-06-27) Ran, Yang; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Video based human motion analysis has been actively studied over the past decades. We propose novel approaches that are able to analyze human motion under such challenges and apply them to surveillance and security applications. Part I analyses the cyclic property of human motion and presents algorithms to classify humans in videos by their gait patterns. Two approaches are proposed. The first employs the omputationally efficient periodogram, to characterize periodicity. In order to integrate shape and motion, we convert the cyclic pattern into a binary sequence using the angle between two legs when the toe-to-toe distance is maximized during walking. Part II further extends the previous approaches to analyze the symmetry in articulation within a stride. A feature that has been shown in our work to be a particularly strong indicator of the presence of pedestrians is the X-junction generated by bipedal swing of body limbs. The proposed algorithm extracts the patterns in spatio-temporal surfaces. In Part III, we present a compact characterization of human gait and activities. Our approach is based on decomposing an image sequence into x-t slices, which generate twisted patterns defined as the Double Helical Signature (DHS). It is shown that the patterns sufficiently characterize human gait and a class of activities. The features of DHS are: (1) it naturally codes appearance and kinematic parameters of human motion; (2) it reveals an inherent geometric symmetry (Frieze Group); and (3) it is effective and efficient for recovering gait and activity parameters. Finally, we use the DHS to classify activities such as carrying a backpack, briefcase etc. The advantage of using DHS is that we only need a small portion of 3D data to recognize various symmetries.
  • Item
    Adaptive Analysis and Processing of Structured Multilingual Documents
    (2006-01-19) Ma, Huanfeng; Chellappa, Rama; Doermann, David S; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Digital document processing is becoming popular for application to office and library automation, bank and postal services, publishing houses and communication management. In recent years, the demand for tools capable of searching written and spoken sources of multilingual information has increased tremendously, where the bilingual dictionary is one of the important resource to provide the required information. Processing and analysis of bilingual dictionaries brought up the challenges of dealing with many different scripts, some of which are unknown to the designer. A framework is presented to adaptively analyze and process structured multilingual documents, where adaptability is applied to every step. The proposed framework involves: (1) General word-level script identification using Gabor filter. (2) Font classification using the grating cell operator. (3) General word-level style identification using Gaussian mixture model. (4) An adaptable Hindi OCR based on generalized Hausdorff image comparison. (5) Retargetable OCR with automatic training sample creation and its applications to different scripts. (6) Bootstrapping entry segmentation, which segments each page into functional entries for parsing. Experimental results working on different scripts, such as Chinese, Korean, Arabic, Devanagari, and Khmer, demonstrate that the proposed framework can save human efforts significantly by making each phase adaptive.