Thumbnail Image


Publication or External Link






Sparse representation, acquisition and reconstruction of signals guided by theory of Compressive Sensing (CS) has become an active research research topic over the last few years. Sparse representations effectively capture the idea of parsimony enabling novel acquisition schemes including sub-Nyquist sampling. Ideas from CS have had significant impact on well established fields such as signal acquisition, machine

learning and statistics and have also inspired new areas of research such as low rank matrix completion. In this dissertation we apply CS ideas to low-level computer vision problems. The contribution of this dissertation is to show that CS theory is an important addition to the existing computational toolbox in computer vision and pattern recognition, particularly in data representation and processing.

Additionally, in each of the problems we show how sparse representation helps in improved modeling of the underlying data leading to novel applications and better understanding of existing problems.

In our work, the impact of CS is most felt in the acquisition of videos with

novel camera designs. We build prototype cameras with slow sensors capable of capturing at an order of magnitude higher temporal resolution. First, we propose sub-Nyquist acquisition of periodic events and then generalize the idea to capturing regular events. Both the cameras operate by first acquiring the video at a slower rate and then computationally recovering the desired higher temporal resolution

frames. In our camera, we sense the light with a slow sensor after modulating it with a fluttering shutter and then reconstruct the high speed video by enforcing its sparsity. Our cameras offer a significant advantage in light efficiency and cost by obviating the need to sense, transfer and store data at a higher frame rate.

Next, we explore the applicability of compressive cameras for computer vision applications in bandwidth constrained scenarios. We design a compressive camera capable of capturing video using fewer measurements and also separate the foreground from the background. We model surveillance type videos with two processes, a slower background and a faster but spatially sparse foreground such that

we can recover both of them separately and accurately. By formulating the problem in a distributed CS framework we achieve state-of-the-art video reconstruction and background subtraction. Subsequently we show that if the camera geometry is provided in a multi-camera setting, the background subtracted CS images can be used for localizing the object and tracking it by formulating its occupancy in a grid as a

sparse reconstruction problem.

Finally, we apply CS to robust estimation of gradients obtained through photometric stereo and other gradient-based techniques. Since gradient fields are often not integrable, the errors in them need to be estimated and removed. By assuming the errors, particularly the outliers, as sparse in number we accurately estimate and remove them. Using conditions on sparse recovery in CS we characterize the distribution of errors which can be corrected completely and those that can be only partially corrected. We show that our approach has the important property of localizing the effect of error during integration where other parts of the surface are not affected by errors in gradients at a particular location.

This dissertation is one of the earliest to investigate the implications of compressive sensing theory to some computer vision problems. We hope that this effort will spur more interest in researchers drawn from computer vision, computer graphics, computational photography, statistics and mathematics.