ROBUST REPRESENTATIONS FOR UNCONSTRAINED FACE RECOGNITION AND ITS APPLICATIONS
Face identification and verification are important problems in computer vision and have been actively researched for over two decades. There are several applications including mobile authentication, visual surveillance, social network analysis, and video content analysis. Many algorithms have shown to work well on images collected in controlled settings. However, the performance of these algorithms often degrades significantly on images that have large variations in pose, illumination and expression as well as due to aging, cosmetics, and occlusion. How to extract robust and discriminative feature representations from face images/videos is an important problem to achieve good performance in uncontrolled settings. In this dissertation, we present several approaches to extract robust feature representation from a set of images/video frames for face identification and verification problems. We first present a dictionary approach with dense facial landmark features. Each face video is segmented into K partitions first, and the multi-scale features are extracted from patches centered at detected facial landmarks. Then, compact and representative dictionaries are learned from dense features for each partition of a video and then concatenated together into a video dictionary representation for the video. Experiments show that the representation is effective for the unconstrained video-based face identification task. Secondly, we present a landmark-based Fisher vector approach for video-based face verification problems. This approach encodes over-complete local features into a high-dimensional feature representation followed by a learned joint Bayesian metric to project the feature vector into a low-dimensional space and to compute the similarity score. We then present an automated system for face verification which exploits features from deep convolutional neural networks (DCNN) trained using the CASIA-WebFace dataset. Our experimental results show that the DCNN model is able to characterize the face variations from the large-scale source face dataset and generalizes well to another smaller one. Finally, we also demonstrate that the model pre-trained for face identification and verification tasks encodes rich face information which benefit other face-related tasks with scarce annotated training data. We use apparent age estimation as an example and develop a cascade convolutional neural network framework which consists of age group classification and age regression, and a deep networks is fine-tuned using the target data.