Computer Science Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2756

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    Analyzing Structured Scenarios by Tracking People and Their Limbs
    (2010) Morariu, Vlad Ion; Davis, Larry S; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The analysis of human activities is a fundamental problem in computer vision. Though complex, interactions between people and their environment often exhibit a spatio-temporal structure that can be exploited during analysis. This structure can be leveraged to mitigate the effects of missing or noisy visual observations caused, for example, by sensor noise, inaccurate models, or occlusion. Trajectories of people and their hands and feet, often sufficient for recognition of human activities, lead to a natural qualitative spatio-temporal description of these interactions. This work introduces the following contributions to the task of human activity understanding: 1) a framework that efficiently detects and tracks multiple interacting people and their limbs, 2) an event recognition approach that integrates both logical and probabilistic reasoning in analyzing the spatio-temporal structure of multi-agent scenarios, and 3) an effective computational model of the visibility constraints imposed on humans as they navigate through their environment. The tracking framework mixes probabilistic models with deterministic constraints and uses AND/OR search and lazy evaluation to efficiently obtain the globally optimal solution in each frame. Our high-level reasoning framework efficiently and robustly interprets noisy visual observations to deduce the events comprising structured scenarios. This is accomplished by combining First-Order Logic, Allen's Interval Logic, and Markov Logic Networks with an event hypothesis generation process that reduces the size of the ground Markov network. When applied to outdoor one-on-one basketball videos, our framework tracks the players and, guided by the game rules, analyzes their interactions with each other and the ball, annotating the videos with the relevant basketball events that occurred. Finally, motivated by studies of spatial behavior, we use a set of features from visibility analysis to represent spatial context in the interpretation of human spatial activities. We demonstrate the effectiveness of our representation on trajectories generated by humans in a virtual environment.
  • Thumbnail Image
    Item
    ROBUST TECHNIQUES FOR VISUAL SURVEILLANCE
    (2008-07-29) Tran, Son Dinh; David, Larry S; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The work described here aims at improving the performance of three building blocks of visual surveillance systems: foreground detection, object tracking and event detection. First, a new background subtraction algorithm is presented for foreground detection. The background model is built with a set of codewords for every pixel. The codeword contains the pixel's principle color and a tangent vector that represents the color variation at that pixel. As the scene illumination changes, a pixel's color is predicted using a linear model of the codeword and the codeword, in turn, is updated using the new observation. We carried out a number of experiments on sequences that have extensive lighting change and compare with previously developed algorithms. Second, we describe a multi-resolution tracking framework developed with efficiency and robustness in mind. Efficiency is achieved by processing low resolution data whenever possible. Robustness results from multiple level coarse-to-fine searching in the tracking state space. We combine sequential filtering both in time and resolution levels int a probabilistic framework. A color blob tracker is implemented and the tracking results are evaluated in a number of experiments. Third, we present a tracking algorithm based on motion analysis of regional affine invariant image features. The tracked object is represented with a probabilistic occupancy map. Using this map as support, regional features are detected and matched across frames. The motion of pixels is then established based on the feature motion. The object occupancy map is in turn updated according to the pixel motion consistency. We describe experiments to measure the sensitivity of our approach to inaccuracy in initialization, and compare it with other approaches. Fourth, we address the problem of visual event recognition in surveillance where noise and missing observations are serious problems. Common sense domain knowledge is exploited to overcome them. The knowledge is represented as first- order logic production rules with associated weights to indicate their confidence. These rules are used in combination with a relaxed deduction algorithm to construct a network of grounded atoms, the Markov Logic Network. The network is used to perform probabilistic inference for input queries about events of interest. The system's performance is demonstrated on a number of videos from a parking lot domain that contains complex interactions of people and vehicles.