View Synthesis from Image and Video for Object Recognition Applications

Thumbnail Image
umi-umd-4722.pdf(4.47 MB)
No. of downloads: 1086
Publication or External Link
Yue, Zhanfeng
Chellappa, Rama
Object recognition is one of the most important and successful applications in computer vision community. The varying appearances of the test object due to different poses or illumination conditions can make the object recognition problem very challenging. Using view synthesis techniques to generate pose-invariant or illumination-invariant images or videos of the test object is an appealing approach to alleviate the degrading recognition performance due to non-canonical views or lighting conditions. In this thesis, we first present a complete framework for better synthesis and understanding of the human pose from a limited number of available silhouette images. Pose-normalized silhouette images are generated using an active virtual camera and an image based visual hull technique, with the silhouette turning function distance being used as the pose similarity measurement. In order to overcome the inability of the shape from silhouettes method to reonstruct concave regions for human postures, a view synthesis algorithm is proposed for articulating humans using visual hull and contour-based body part segmentation. These two components improve each other for better performance through the correspondence across viewpoints built via the inner distance shape context measurement. Face recognition under varying pose is a challenging problem, especially when illumination variations are also present. We propose two algorithms to address this scenario. For a single light source, we demonstrate a pose-normalized face synthesis approach on a pixel-by-pixel basis from a single view by exploiting the bilateral symmetry of the human face. For more complicated illumination condition, the spherical harmonic representation is extended to encode pose information. An efficient method is proposed for robust face synthesis and recognition with a very compact training set. Finally, we present an end-to-end moving object verification system for airborne video, wherein a homography based view synthesis algorithm is used to simultaneously handle the object's changes in aspect angle, depression angle, and resolution. Efficient integration of spatial and temporal model matching assures the robustness of the verification step. As a byproduct, a robust two camera tracking method using homography is also proposed and demonstrated using challenging surveillance video sequences.