Shape Dynamical Models for Activity Recognition and Coded Aperture Imaging for Light-Field Capture

Thumbnail Image


umi-umd-5904.pdf (9.47 MB)
No. of downloads: 1080

Publication or External Link






Classical applications of Pattern recognition in image processing and computer vision have typically dealt with modeling, learning and recognizing static patterns in images and videos. There are, of course, in nature, a whole class of patterns that dynamically evolve over time. Human activities, behaviors of insects and animals, facial expression changes, lip reading, genetic expression profiles are some examples of patterns that are dynamic. Models and algorithms to study these patterns must take into account the dynamics of these patterns while exploiting the classical pattern recognition techniques. The first part of this dissertation is an attempt to model and recognize such dynamically evolving patterns. We will look at specific instances of such dynamic patterns like human activities, and behaviors of insects and develop algorithms to learn models of such patterns and classify such patterns. The models and algorithms proposed are validated by extensive experiments on gait-based person identification, activity recognition and simultaneous tracking and behavior analysis of insects.

The problem of comparing dynamically deforming shape sequences arises repeatedly in problems like activity recognition and lip reading. We describe and evaluate parametric and non-parametric models for shape sequences. In particular, we emphasize the need to model activity execution rate variations and propose a non-parametric model that is insensitive to such variations. These models and the resulting algorithms are shown to be extremely effective for a wide range of applications from gait-based person identification to human action recognition. We further show that the shape dynamical models are not only effective for the problem of recognition, but also can be used as effective priors for the problem of simultaneous tracking and behavior analysis. We validate the proposed algorithm for performing simultaneous behavior analysis and tracking on videos of bees dancing in a hive.

In the last part of this dissertaion, we investigate computational imaging, an emerging field where the process of image formation involves the use of a computer. The current trend in computational imaging is to capture as much information about the scene as possible during capture time so that appropriate images with varying focus, aperture, blur and colorimetric settings may be rendered as required. In this regard, capturing the 4D light-field as opposed to a 2D image allows us to freely vary viewpoint and focus at the time of rendering an image. In this dissertation, we describe a theoretical framework for reversibly modulating {4D} light fields using an attenuating mask in the optical path of a lens based camera. Based on this framework, we present a novel design to reconstruct the {4D} light field from a {2D} camera image without any additional refractive elements as required by previous light field cameras. The patterned mask attenuates light rays inside the camera instead of bending them, and the attenuation recoverably encodes the rays on the {2D} sensor. Our mask-equipped camera focuses just as a traditional camera to capture conventional {2D} photos at full sensor resolution, but the raw pixel values also hold a modulated {4D} light field. The light field can be recovered by rearranging the tiles of the {2D} Fourier transform of sensor values into {4D} planes, and computing the inverse Fourier transform. In addition, one can also recover the full resolution image information for the in-focus parts of the scene.