ROBUST FACIAL LANDMARKS LOCALIZATION WITH APPLICATIONS IN FACIAL BIOMETRICS
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Localization of regions of interest on images and videos is a well studied prob-
lem in computer vision community. Usually localization tasks imply localization of
objects in a given image, such as detection and segmentation of objects in images.
However, the regions of interests can be limited to a single pixel as in the task of
facial landmark localization or human pose estimation. This dissertation studies ro-
bust facial landmark detection algorithms for faces in the wild using learning methods
based on Convolution Neural Networks.
Detection of specific keypoints on face images is an integral pre-processing step
in facial biometrics and numerous other applications including face verification and
identification. Detecting keypoints allows to align face images to a canonical coordi-
nate system using geometric transforms such as similarity or affine transformations
mitigating the adverse affects of rotation and scaling. This challenging problem has
become more attractive in recent years as a result of advances in deep learning and
release of more unconstrained datasets. The research community is pushing bound-aries to achieve better and better performance on unconstrained images, where the
images are diverse in pose, expression and lightning conditions.
Over the years, researchers have developed various hand crafted techniques
to extract meaningful features from features, most of them being appearance and
geometry-based features. However, these features do not perform well for data col-
lected in unconstrained settings due to large variations in appearance and other nui-
sance factors. Convolution Neural Networks (CNNs) have become prominent because
of their ability to extract discriminating features. Unlike the hand crafted features,
DCNNs perform feature extraction and feature classification from the data itself in
an end-to-end fashion. This enables the DCNNs to be robust to variations present
in the data and at the same time improve their discriminative ability.
In this dissertation, we discuss three different methods for facial keypoint de-
tection based on Convolution Neural Networks. The methods are generic and can be
extended to a related problem of keypoint detection for human pose estimation. The
first method called Cascaded Local Deep Descriptor Regression uses deep features ex-
tracted around local points to learn linear regressors for incrementally correcting the
initial estimate of the keypoints. In the second method, called KEPLER, we develop
efficient Heatmap CNNs to directly learn the non-linear mapping between the input
and target spaces. We also apply different regularization techniques to tackle the
effects of imbalanced data and vanishing gradients. In the third method, we model
the spatial correlation between different keypoints using Pose Conditioned Convo-
lution Deconvolution Networks (PCD-CNN) while at the same time making it pose
agnostic by disentangling pose from the face image. Next, we show an applicationof facial landmark localization used to align the face images for the task of apparent
age estimation of humans from unconstrained images.
In the fourth part of this dissertation we discuss the impact of good quality
landmarks on the task of face verification. Previously proposed methods perform
with reasonable accuracy on high resolution and good quality images, but fail when
the input image suffers from degradation. To this end, we propose a semi-supervised
method which aims at predicting landmarks in the low quality images. This method
learns to predict landmarks in low resolution images by learning to model the learning
process of high resolution images. In this algorithm, we use Generative Adversarial
Networks, which first learn to model the distribution of real low resolution images
after which another CNN learns to model the distribution of heatmaps on the images.
Additionally, we also propose another high quality facial landmark detection method,
which is currently state of the art.
Finally, we also discuss the extension of ideas developed for facial keypoint
localization for the task of human pose estimation, which is one of the important
cues for Human Activity Recognition. As in PCD-CNN, the parts of human body
can also be modelled in a tree structure, where the relationship between these parts are
learnt through convolutions while being conditioned on the 3D pose and orientation.
Another interesting avenue for research is extending facial landmark localization to
naturally degraded images.