State Space Approaches for Modeling Activities in Video Streams
Publication or External Link
The objective is to discern events and behavior in activities using video sequences, which conform to common human experience. It has several applications such as recognition, temporal segmentation, video indexing and anomaly detection. Activity modeling offers compelling challenges to computational vision systems at several levels ranging from low-level vision tasks for detection and segmentation to high-level models for extracting perceptually salient information. With a focus on the latter, the following approaches are presented: event detection in discrete state space, epitomic representation in continuous state space, temporal segmentation using mixed state models, key frame detection using antieigenvalues and spatio-temporal activity volumes.
Significant changes in motion properties are said to be events. We present an event probability sequence representation in which the probability of event occurrence is computed using stable changes at the state level of the discrete state hidden Markov model that generates the observed trajectories. Reliance on a trained model however, can be a limitation. A data-driven antieigenvalue-based approach is proposed for detecting changes. Antieigenvalues are sensitive to turnings whereas eigenvalues capture directions of maximum variance in the data. In both these approaches, events are assumed to be instantaneous quantities. This is relaxed using an epitomic representation in continuous state space.
Video sequences are segmented using a sliding window within which the dynamics of each object is assumed to be linear. The system matrix, initial state value and the input signal statistics are said to form an epitome. The system matrices are decomposed using the Iwasawa matrix decomposition to isolate the effect of rotation, scaling and projection of the state vector. It is used to compute physically meaningful distances between epitomes. Epitomes reveal dominant primitives of activities that have an abstracted interpretation. A mixed state approach for activities is presented in which higher-level primitives of behavior is encoded in the discrete state component and observed dynamics in the continuous state component. The effectiveness of mixed state models is demonstrated using temporal segmentation. In addition to motion trajectories, the volume carved out in an xyt cube by a moving object is characterized using Morse functions.