Electrical & Computer Engineering Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2765

Browse

Search Results

Now showing 1 - 10 of 60
  • Item
    Activity Representation from Video Using Statistical Models on Shape Manifolds
    (2010) Abdelkader, Mohamed F.; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Activity recognition from video data is a key computer vision problem with applications in surveillance, elderly care, etc. This problem is associated with modeling a representative shape which contains significant information about the underlying activity. In this dissertation, we represent several approaches for view-invariant activity recognition via modeling shapes on various shape spaces and Riemannian manifolds. The first two parts of this dissertation deal with activity modeling and recognition using tracks of landmark feature points. The motion trajectories of points extracted from objects involved in the activity are used to build deformation shape models for each activity, and these models are used for classification and detection of unusual activities. In the first part of the dissertation, these models are represented by the recovered 3D deformation basis shapes corresponding to the activity using a non-rigid structure from motion formulation. We use a theory for estimating the amount of deformation for these models from the visual data. We study the special case of ground plane activities in detail because of its importance in video surveillance applications. In the second part of the dissertation, we propose to model the activity by learning an affine invariant deformation subspace representation that captures the space of possible body poses associated with the activity. These subspaces can be viewed as points on a Grassmann manifold. We propose several statistical classification models on Grassmann manifold that capture the statistical variations of the shape data while following the intrinsic Riemannian geometry of these manifolds. The last part of this dissertation addresses the problem of recognizing human gestures from silhouette images. We represent a human gesture as a temporal sequence of human poses, each characterized by a contour of the associated human silhouette. The shape of a contour is viewed as a point on the shape space of closed curves and, hence, each gesture is characterized and modeled as a trajectory on this shape space. We utilize the Riemannian geometry of this space to propose a template-based and a graphical-based approaches for modeling these trajectories. The two models are designed in such a way to account for the different invariance requirements in gesture recognition, and also capture the statistical variations associated with the contour data.
  • Item
    Video Processing with Additional Information
    (2010) Ramachandran, Mahesh; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Cameras are frequently deployed along with many additional sensors in aerial and ground-based platforms. Many video datasets have metadata containing measurements from inertial sensors, GPS units, etc. Hence the development of better video processing algorithms using additional information attains special significance. We first describe an intensity-based algorithm for stabilizing low resolution and low quality aerial videos. The primary contribution is the idea of minimizing the discrepancy in the intensity of selected pixels between two images. This is an application of inverse compositional alignment for registering images of low resolution and low quality, for which minimizing the intensity difference over salient pixels with high gradients results in faster and better convergence than when using all the pixels. Secondly, we describe a feature-based method for stabilization of aerial videos and segmentation of small moving objects. We use the coherency of background motion to jointly track features through the sequence. This enables accurate tracking of large numbers of features in the presence of repetitive texture, lack of well conditioned feature windows etc. We incorporate the segmentation problem within the joint feature tracking framework and propose the first combined joint-tracking and segmentation algorithm. The proposed approach enables highly accurate tracking, and segmentation of feature tracks that is used in a MAP-MRF framework for obtaining dense pixelwise labeling of the scene. We demonstrate competitive moving object detection in challenging video sequences of the VIVID dataset containing moving vehicles and humans that are small enough to cause background subtraction approaches to fail. Structure from Motion (SfM) has matured to a stage, where the emphasis is on developing fast, scalable and robust algorithms for large reconstruction problems. The availability of additional sensors such as inertial units and GPS along with video cameras motivate the development of SfM algorithms that leverage these additional measurements. In the third part, we study the benefits of the availability of a specific form of additional information - the vertical direction (gravity) and the height of the camera both of which can be conveniently measured using inertial sensors, and a monocular video sequence for 3D urban modeling. We show that in the presence of this information, the SfM equations can be rewritten in a bilinear form. This allows us to derive a fast, robust, and scalable SfM algorithm for large scale applications. The proposed SfM algorithm is experimentally demonstrated to have favorable properties compared to the sparse bundle adjustment algorithm. We provide experimental evidence indicating that the proposed algorithm converges in many cases to solutions with lower error than state-of-art implementations of bundle adjustment. We also demonstrate that for the case of large reconstruction problems, the proposed algorithm takes lesser time to reach its solution compared to bundle adjustment. We also present SfM results using our algorithm on the Google StreetView research dataset, and several other datasets.
  • Item
    Localizing the Effects of Link Flooding Attacks in the Internet
    (2009) Lee, Soo Bum; Gligor, Virgil D.; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Malware-contaminated hosts organized as a ``bot network'' can target and flood network links (e.g., routers). Yet, none of the countermeasures to link flooding proposed to date have provided dependable link access (i.e., link access guarantees) for legitimate traffic during such attacks. Network-layer capabilities offer strong protection against link flooding by authorizing individual flows with unforgeable credentials (i.e., capabilities). However, network-layer capabilities are insufficient for dependable link access, for several reasons: (1) the capability-setup channel is vulnerable to flooding attacks that prevent legitimate clients from acquiring capabilities; i.e., Denial of Capability (DoC) attacks, (2) compromised attack sources that have acquired capabilities in a legitimate way can flood the privileged channel reserved for capability carrying packets, and (3) the global effects of flooding attacks are still unavoidable with ``per-flow'' based capabilities. In this dissertation, we present a router-level design that confines the effects of link flooding attacks to specified locales or neighborhoods (e.g., one or more administrative domains of the Internet) based on network-layer capabilities. Our design provides differential guarantees for access to network links that favor packets from uncontaminated domains by attack sources (e.g., bots) and yet do not deny access to packets from contaminated domains. For connection-request packets (i.e., capability requests), differential access guarantees are defined as the probabilistic lower bounds for link access: requests from uncontaminated domains have higher probabilistic lower bounds for link access than those from contaminated domains. For all other packets, differential access guarantees are defined in terms of the the bandwidth allocated to packet flows; i.e., flows of malware-uncontaminated domains receive higher bandwidth guarantees than flows of contaminated ones, and legitimate flows of contaminated domains are guaranteed substantially higher bandwidth than attack flows. Potential side-effects of attack flows (e.g., multiple congested links) are mitigated by a differential routing scheme, whereby flows of malware-uncontaminated domains are routed through less congested paths while those of contaminated domains are routed through the ``pinned'' default paths. We present analytical models for the proposed notions of dependable link access, and evaluate our router design both by comprehensive simulations under different attack scenarios and by comparisons with other flooding-defense schemes.
  • Item
    VISUAL TRACKING AND ILLUMINATION RECOVERY VIA SPARSE REPRESENTATION
    (2009) Mei, Xue; Jacobs, David W; Qu, Gang; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Compressive sensing, or sparse representation, has played a fundamental role in many fields of science. It shows that the signals and images can be reconstructed from far fewer measurements than what is usually considered to be necessary. Sparsity leads to efficient estimation, efficient compression, dimensionality reduction, and efficient modeling. Recently, there has been a growing interest in compressive sensing in computer vision and it has been successfully applied to face recognition, background subtraction, object tracking and other problems. Sparsity can be achieved by solving the compressive sensing problem using L1 minimization. In this dissertation, we present the results of a study of applying sparse representation to illumination recovery, object tracking, and simultaneous tracking and recognition. Illumination recovery, also known as inverse lighting, is the problem of recovering an illumination distribution in a scene from the appearance of objects located in the scene. It is used for Augmented Reality, where the virtual objects match the existing image and cast convincing shadows on the real scene rendered with the recovered illumination. Shadows in a scene are caused by the occlusion of incoming light, and thus contain information about the lighting of the scene. Although shadows have been used in determining the 3D shape of the object that casts shadows onto the scene, few studies have focused on the illumination information provided by the shadows. In this dissertation, we recover the illumination of a scene from a single image with cast shadows given the geometry of the scene. The images with cast shadows can be quite complex and therefore cannot be well approximated by low-dimensional linear subspaces. However, in this study we show that the set of images produced by a Lambertian scene with cast shadows can be efficiently represented by a sparse set of images generated by directional light sources. We first model an image with cast shadows as composed of a diffusive part (without cast shadows) and a residual part that captures cast shadows. Then, we express the problem in an L1-regularized least squares formulation, with nonnegativity constraints (as light has to be nonnegative at any point in space). This sparse representation enjoys an effective and fast solution, thanks to recent advances in compressive sensing. In experiments on both synthetic and real data, our approach performs favorably in comparison to several previously proposed methods. Visual tracking, which consistently infers the motion of a desired target in a video sequence, has been an active and fruitful research topic in computer vision for decades. It has many practical applications such as surveillance, human computer interaction, medical imaging and so on. Many challenges to design a robust tracking algorithm come from the enormous unpredictable variations in the target, such as deformations, fast motion, occlusions, background clutter, and lighting changes. To tackle the challenges posed by tracking, we propose a robust visual tracking method by casting tracking as a sparse approximation problem in a particle filter framework. In this framework, occlusion, noise and other challenging issues are addressed seamlessly through a set of trivial templates. Specifically, to find the tracking target at a new frame, each target candidate is sparsely represented in the space spanned by target templates and trivial templates. The sparsity is achieved by solving an L1-regularized least squares problem. Then the candidate with the smallest projection error is taken as the tracking target. After that, tracking is continued using a Bayesian state inference framework in which a particle filter is used for propagating sample distributions over time. Three additional components further improve the robustness of our approach: 1) a velocity incorporated motion model that helps concentrate the samples on the true target location in the next frame, 2) the nonnegativity constraints that help filter out clutter that is similar to tracked targets in reversed intensity patterns, and 3) a dynamic template update scheme that keeps track of the most representative templates throughout the tracking procedure. We test the proposed approach on many challenging sequences involving heavy occlusions, drastic illumination changes, large scale changes, non-rigid object movement, out-of-plane rotation, and large pose variations. The proposed approach shows excellent performance in comparison with four previously proposed trackers. We also extend the work to simultaneous tracking and recognition in vehicle classification in IR video sequences. We attempt to resolve the uncertainties in tracking and recognition at the same time by introducing a static template set that stores target images in various conditions such as different poses, lighting, and so on. The recognition results at each frame are propagated to produce the final result for the whole video. The tracking result is evaluated at each frame and low confidence in tracking performance initiates a new cycle of tracking and classification. We demonstrate the robustness of the proposed method on vehicle tracking and classification using outdoor IR video sequences.
  • Item
    SECURE, POLICY-BASED, MULTI-RECIPIENT DATA SHARING
    (2009) Bobba, Rakesh B.; Gligor, Virgil D.; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In distributed systems users often need to share sensitive data with other users based on the latter's ability to satisfy various policies. In many cases the data owner may not even know the identities of the data recipients, but deems it crucial that they are legitimate; i.e., satisfy the policy. Enabling such data sharing over the Internet faces the challenge of (1) securely associating access policies with data and enforcing them, and (2) protecting data as it traverses untrusted proxies and intermediate repositories. Furthermore, it is desirable to achieve properties such as: (1) flexibility of access policies; (2) privacy of sensitive access policies; (3) minimal reliance on trusted third parties; and (4) efficiency of access policy enforcement. Often schemes enabling controlled data sharing need to trade one property for another. In this dissertation, we propose two complimentary policy-based data sharing schemes that achieve different subsets of the above desired properties. In the first part of this dissertation, we focus on CiphertextPolicy Attribute- Based Encryption (CP-ABE) schemes that specify and enforce access policies cryptographically and eliminate trusted mediators. We motivate the need for flexible attribute organization within user keys for efficient support of many practical applications. We then propose Ciphertext-Policy Attribute-Set Based Encryption (CP-ASBE) which is the first CP-ABE scheme to (1) efficiently support naturally occurring compound attributes, (2) support multiple numerical assignments for a given attribute in a single key and (3) provide efficient key management. While the CP-ASBE scheme minimizes reliance on trusted mediators, it can support neither context-based policies nor policy privacy. In the second part of this dissertation, we propose Policy Based Encryption System (PBES), which employs mediated decryption and supports both context-based policies and policy privacy. Finally, we integrate the proposed schemes into practical applications (i.e., CP-ASBE scheme with Attribute-Based Messaging (ABM) and PBES scheme with a conditional data sharing application in the Power Grid) and demonstrate their usefulness in practice.
  • Item
    Robust and Efficient Inference of Scene and Object Motion in Multi-Camera Systems
    (2009) Sankaranarayanan, Aswin C; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Multi-camera systems have the ability to overcome some of the fundamental limitations of single camera based systems. Having multiple view points of a scene goes a long way in limiting the influence of field of view, occlusion, blur and poor resolution of an individual camera. This dissertation addresses robust and efficient inference of object motion and scene in multi-camera and multi-sensor systems. The first part of the dissertation discusses the role of constraints introduced by projective imaging towards robust inference of multi-camera/sensor based object motion. We discuss the role of the homography and epipolar constraints for fusing object motion perceived by individual cameras. For planar scenes, the homography constraints provide a natural mechanism for data association. For scenes that are not planar, the epipolar constraint provides a weaker multi-view relationship. We use the epipolar constraint for tracking in multi-camera and multi-sensor networks. In particular, we show that the epipolar constraint reduces the dimensionality of the state space of the problem by introducing a ``shared'' state space for the joint tracking problem. This allows for robust tracking even when one of the sensors fail due to poor SNR or occlusion. The second part of the dissertation deals with challenges in the computational aspects of tracking algorithms that are common to such systems. Much of the inference in the multi-camera and multi-sensor networks deal with complex non-linear models corrupted with non-Gaussian noise. Particle filters provide approximate Bayesian inference in such settings. We analyze the computational drawbacks of traditional particle filtering algorithms, and present a method for implementing the particle filter using the Independent Metropolis Hastings sampler, that is highly amenable to pipelined implementations and parallelization. We analyze the implementations of the proposed algorithm, and in particular concentrate on implementations that have minimum processing times. The last part of the dissertation deals with the efficient sensing paradigm of compressing sensing (CS) applied to signals in imaging, such as natural images and reflectance fields. We propose a hybrid signal model on the assumption that most real-world signals exhibit subspace compressibility as well as sparse representations. We show that several real-world visual signals such as images, reflectance fields, videos etc., are better approximated by this hybrid of two models. We derive optimal hybrid linear projections of the signal and show that theoretical guarantees and algorithms designed for CS can be easily extended to hybrid subspace-compressive sensing. Such methods reduce the amount of information sensed by a camera, and help in reducing the so called data deluge problem in large multi-camera systems.
  • Item
    Robust Methods for Visual Tracking and Model Alignment
    (2009) Wu, Hao; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The ubiquitous presence of cameras and camera networks needs the development of robust visual analytics algorithms. As the building block of many video visual surveillance tasks, a robust visual tracking algorithm plays an important role in achieving the goal of automatic and robust surveillance. In practice, it is critical to know when and where the tracking algorithm fails so that remedial measures can be taken to resume tracking. We propose a novel performance evaluation strategy for tracking systems using a time-reversed Markov chain. We also present a novel bidirectional tracker to achieve better robustness. Instead of looking only forward in the time domain, we incorporate both forward and backward processing of video frames using a time-reversibility constraint. When the objects of interest in surveillance applications have relatively stable structures, the parameterized shape model of objects can be usually built or learned from sample images, which allows us to perform more accurate tracking. We present a machine learning method to learn a scoring function without local extrema to guide the gradient descent/accent algorithm and find the optimal parameters of the shape model. These algorithms greatly improve the robustness of video analysis systems in practice.
  • Item
    Random Codes and Graphs for Secure Communication
    (2009) Anthapadmanabhan, Nagaraj Prasanth; Barg, Alexander; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This dissertation considers two groups of problems related to secure communication. The first line of research is devoted to theoretical problems of copyright protection of digital content. Embedding identification data in the content is a well-developed technique of content protection known under the name of fingerprinting. Schemes that provide such protection are known as fingerprinting codes in the literature. We study limits of the number of users of a fingerprinting system as well as constructions of low-complexity fingerprinting codes that support a large number of users. The second problem that is addressed in the dissertation relates to connectivity analysis of ad hoc wireless networks. One of the basic requirements in such environments is to ensure that none of the nodes are completely isolated from the network. We address the problem of characterizing threshold parameters for node isolation that enable the system designer to choose the power needed for network operation based on the outage probability of links in the network. The methods of this research draw from coding theory, information theory and random graphs. An idea that permeates most results in this dissertation is the application of randomization both in the analysis of fingerprinting and node isolation. The main contributions of this dissertation belong in the area of fingerprinting and are described as follows. We derive new lower and upper bounds on the optimal trade-off between the number of users and the length of the fingerprints required to ensure reliability of the system, which we call fingerprinting capacity. Information-theoretic techniques employed in our proofs of bounds on capacity originate in coding theorems for channels with multiple inputs. Constructions of fingerprinting codes draw on methods of coding theory related to list decoding and code concatenation. We also analyze random graph models for ad hoc networks with link failures and secure sensor networks that employ randomized key distribution. We establish a precise zero-one law for node isolation in the model with link failures for nodes placed on the circle. We further generalize this result to obtain a one-law for secure sensor networks on some surfaces.
  • Item
    SIMULTANEOUS MULTI-VIEW FACE TRACKING AND RECOGNITION IN VIDEO USING PARTICLE FILTERING
    (2009) Seo, Naotoshi; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Recently, face recognition based on video has gained wide interest especially due to its role in surveillance systems. Video-based recognition has superior advantages over image-based recognition because a video contains image sequences as well as temporal information. However, surveillance videos are generally of low-resolution and contain faces mostly in non-frontal poses. We propose a multi-view, video-based face recognition algorithm using the Bayesian inference framework. This method represents an appearance of each subject by a complex nonlinear appearance manifold expressed as a collection of simpler pose manifolds and the connections, represented by transition probabilities, among them. A Bayesian inference formulation is introduced to utilize the temporal information in the video via the transition probabilities among pose manifolds. The Bayesian inference formulation realizes video-based face recognition by progressively accumulating the recognition confidences in frames. The accumulation step possibly enables to solve face recognition problems in low-resolution videos, and the progressive characteristic is especially useful for a real-time processing. Furthermore, this face recognition framework has another characteristic that does not require processing all frames in a video if enough recognition confidence is accumulated in an intermediate frame. This characteristic gives an advantage over batch methods in terms of a computational efficiency. Furthermore, we propose a simultaneous multi-view face tracking and recognition algorithm. Conventionally, face recognition in a video is performed in tracking-then-recognition scenario that extracts the best facial image patch in the tracking and then recognizes the identity of the facial image. Simultaneous face tracking and recognition works in a different fashion, by handling both tracking and recognition simultaneously. Particle filter is a technique for implementing a Bayesian inference filter by Monte Carlo simulation, which has gained prevalence in the visual tracking literature since the Condensation algorithm was introduced. Since we have proposed a video-based face recognition algorithm based on the Bayesian inference framework, it is easy to integrate the particle filter tracker and our proposed recognition method into one, using the particle filter for both tracking and recognition simultaneously. This simultaneous framework utilizes the temporal information in a video for not only tracking but also recognition by modeling the dynamics of facial poses. Although the time series formulation remains more general, only the facial pose dynamics is utilized for recognition in this thesis.
  • Item
    Modeling Shape, Appearance and Motion for Human Movement Analysis
    (2009) Lin, Zhe; Davis, Larry S; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Shape, Appearance and Motion are the most important cues for analyzing human movements in visual surveillance. Representation of these visual cues should be rich, invariant and discriminative. We present several approaches to model and integrate them for human detection and segmentation, person identification, and action recognition. First, we describe a hierarchical part-template matching approach to simultaneous human detection and segmentation combining local part-based and global shape-based schemes. For learning generic human detectors, a pose-adaptive representation is developed based on a hierarchical tree matching scheme and combined with an support vector machine classifier to perform human/non-human classification. We also formulate multiple occluded human detection using a Bayesian framework and optimize it through an iterative process. We evaluated the approach on several public pedestrian datasets. Second, given regions of interest provided by human detectors, we introduce an approach to iteratively estimates segmentation via a generalized Expectation-Maximization algorithm. The approach incorporates local Markov random field constraints and global pose inferences to propagate beliefs over image space iteratively to determine a coherent segmentation. Additionally, a layered occlusion model and a probabilistic occlusion reasoning scheme are introduced to handle inter-occlusion. The approach is tested on a wide variety of real-life images. Third, we describe an approach to appearance-based person recognition. In learning, we perform discriminative analysis through pairwise coupling of training samples, and estimate a set of normalized invariant profiles by marginalizing likelihood ratio functions which reflect local appearance differences. In recognition, we calculate discriminative information-based distances by a soft voting approach, and combine them with appearance-based distances for nearest neighbor classification. We evaluated the approach on videos of 61 individuals under significant illumination and viewpoint changes. Fourth, we describe a prototype-based approach to action recognition. During training, a set of action prototypes are learned in a joint shape and motion space via $k$-means clustering; During testing, humans are tracked while a frame-to-prototype correspondence is established by nearest neighbor search, and then actions are recognized using dynamic prototype sequence matching. Similarity matrices used for sequence matching are efficiently obtained by look-up table indexing. We experimented the approach on several action datasets.