Thumbnail Image


umi-umd-5634.pdf (33.36 MB)
No. of downloads: 691

Publication or External Link






The work described here aims at improving the performance of three building blocks of visual surveillance systems: foreground detection, object tracking and event detection.

First, a new background subtraction algorithm is presented for foreground detection. The background model is built with a set of codewords for every pixel. The codeword contains the pixel's principle color and a tangent vector that represents the color variation at that pixel. As the scene illumination changes, a pixel's color is predicted using a linear model of the codeword and the codeword, in turn, is updated using the new observation. We carried out a number of experiments on sequences that have extensive lighting change and compare with previously developed algorithms.

Second, we describe a multi-resolution tracking framework developed with efficiency and robustness in mind. Efficiency is achieved by processing low resolution data whenever possible. Robustness results from multiple level coarse-to-fine searching in the tracking state space. We combine sequential filtering both in time and resolution levels int a probabilistic framework. A color blob tracker is implemented and the tracking results are evaluated in a number of experiments.

Third, we present a tracking algorithm based on motion analysis of regional affine invariant image features. The tracked object is represented with a probabilistic occupancy map. Using this map as support, regional features are detected and matched across frames. The motion of pixels is then established based on the feature motion. The object occupancy map is in turn updated according to the pixel motion consistency. We describe experiments to measure the sensitivity of our approach to inaccuracy in initialization, and compare it with other approaches.

Fourth, we address the problem of visual event recognition in surveillance where noise and missing observations are serious problems. Common sense domain knowledge is exploited to overcome them. The knowledge is represented as first-

order logic production rules with associated weights to indicate their confidence. These rules are used in combination with a relaxed deduction algorithm to construct a network of grounded atoms, the Markov Logic Network. The network is used to perform probabilistic inference for input queries about events of interest. The system's performance is demonstrated on a number of videos from a parking lot domain that contains complex interactions of people and vehicles.