Thumbnail Image


umi-umd-2811.pdf (13.58 MB)
No. of downloads: 588

Publication or External Link






Tracking moving objects is a commonly used approach for understanding surveillance video. However, by focusing on only a few key-frames, it is possible to effectively perform tasks such as image segmentation, recognition, object detection, and so on. In this dissertation we describe several methods for appearance analysis of key-frames, which includes region-based background subtraction, a new method for recognizing persons based on their overall extrinsic appearance, regardless of their (upright) pose, and appearance-based local change detection.

To encode the spatial information into an appearance model, we introduce a new feature, path-length, which is defined as the normalized length of the shortest path in the silhouette. The method of appearance recognition uses kernel density estimation (KDE) of probabilities associated with color/path-length profiles and the Kullback-Leibler (KL) distance to compare such profiles with possible models. When there are more than one profile to match in one frame, we adopt multiple matching algorithm enforcing a 1-to-1 constraint to improve performance. Through a comprehensive set of experiments, we show that with suitable normalization of color variables this method is robust under conditions varying viewpoints, complex illumination, and multiple cameras. Using probabilities from KDE we also show that it is possible to easily spot changes in appearance, for instance caused by carried packages.

Lastly, an approach for constructing a gallery of people observed in a video stream is described. We consider two scenarios that require determining the number and identity of participants: outdoor surveillance and meeting rooms. In these applications face identification is typically not feasible due to the low resolution across the face. The proposed approach automatically computes an appearance model based on the clothing of people and employs this model in constructing and matching the gallery of participants. In the meeting room scenario we exploit the fact that the relative locations of subjects are likely to remain unchanged for the whole sequence to construct more a compact gallery.