Adaptive Sensing and Processing for Some Computer Vision Problems

Thumbnail Image


Publication or External Link





This dissertation is concerned with adaptive sensing and processing in computer vision, specifically through the application of computer vision techniques to non-standard sensors.

In the first part, we adapt techniques designed to solve the classical computer vision problem of gradient-based surface reconstruction to the problem of phase unwrapping that presents itself in applications such as interferometric synthetic aperture radar. Specifically, we propose a new formulation of and solution to the classical two-dimensional phase unwrapping problem. As is usually done, we use the wrapped principal phase gradient field as a measurement of the absolute phase gradient field. Since this model rarely holds in practice, we explicitly enforce integrability of the gradient measurements through a sparse error-correction model. Using a novel energy-minimization functional, we formulate the phase unwrapping task as a generalized lasso problem. We then jointly estimate the absolute phase and the sparse measurement errors using the alternating direction method of multipliers (ADMM) algorithm. Using an interferometric synthetic aperture radar noise model, we evaluate our technique for several synthetic surfaces and compare the results to recently-proposed phase unwrapping techniques. Our method applies new ideas from convex optimization and sparse regularization to this well-studied problem.

In the second part, we consider the problem of controlling and processing measurements from a non-traditional, compressive sensing (CS) camera in real time. We focus on how to control the number of measurements it acquires such that this number remains proportional to the amount of foreground information currently present in the scene under observations. To this end, we provide two novel adaptive-rate CS strategies for sparse, time-varying signals using side information. The first method utilizes extra cross-validation measurements, and the second exploits extra low-resolution measurements. Unlike the majority of current CS techniques, we do not assume that we know an upper bound on the number of significant coefficients pertaining to the images that comprise the video sequence. Instead, we use the side information to predict this quantity for each upcoming image. Our techniques specify a fixed number of spatially-multiplexed CS measurements to acquire, and they adjust this quantity from image to image. Our strategies are developed in the specific context of background subtraction for surveillance video, and we experimentally validate the proposed methods on real video sequences.

Finally, we consider a problem motivated by the application of active pan-tilt-zoom (PTZ) camera control in response to visual saliency. We extend the classical notion of this concept to multi-image data collected using a stationary PTZ camera by requiring consistency: the property that each saliency map in the set of those that are generated should assign the same saliency value to distinct regions of the environment that appear in more than one image. We show that processing each image independently will often fail to provide a consistent measure of saliency, and that using an image mosaic to quantify saliency suffers from several drawbacks. We then propose ray saliency: a mosaic-free method for calculating a consistent measure of bottom-up saliency. Experimental results demonstrating the effectiveness of the proposed approach are presented.