FACIAL EXPRESSION RECOGNITION AND EDITING WITH LIMITED DATA

dc.contributor.advisorChellappa,, Ramaen_US
dc.contributor.authorDing, Huien_US
dc.contributor.departmentElectrical Engineeringen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2020-10-08T05:33:25Z
dc.date.available2020-10-08T05:33:25Z
dc.date.issued2020en_US
dc.description.abstractOver the past five years, methods based on deep features have taken over the computer vision field. While dramatic performance improvements have been achieved for tasks such as face detection and verification, these methods usually need large amounts of annotated data. In practice, not all computer vision tasks have access to large amounts of annotated data. Facial expression analysis is such a task. In this dissertation, we focus on facial expression recognition and editing problems with small datasets. In addition, to cope with challenging conditions like pose and occlusion, we also study unaligned facial attribute detection and occluded expression recognition problems. This dissertation has been divided into four parts. In the first part, we present FaceNet2ExpNet, a novel idea to train a light-weight and high accuracy classification model for expression recognition with small datasets. We first propose a new distribution function to model the high-level neurons of the expression network. Based on this, a two-stage training algorithm is carefully designed. In the pre-training stage, we train the convolutional layers of the expression net, regularized by the face net; In the refining stage, we append fully-connected layers to the pre-trained convolutional layers and train the whole network jointly. Visualization shows that the model trained with our method captures improved high-level expression semantics. Evaluations on four public expression databases demonstrate that our method achieves better results than state-of-the-art. In the second part, we focus on robust facial expression recognition under occlusion and propose a landmark-guided attention branch to find and discard corrupted feature elements from recognition. An attention map is first generated to indicate if a specific facial part is occluded and guide our model to attend to the non-occluded regions. To further increase robustness, we propose a facial region branch to partition the feature maps into non-overlapping facial blocks and enforce each block to predict the expression independently. Depending on the synergistic effect of the two branches, our occlusion adaptive deep network significantly outperforms state-of-the-art methods on two challenging in-the-wild benchmark datasets and three real-world occluded expression datasets. In the third part, we propose a cascade network that simultaneously learns to localize face regions specific to attributes and performs attribute classification without alignment. First, a weakly-supervised face region localization network is designed to automatically detect regions (or parts) specific to attributes. Then multiple part-based networks and a whole-image-based network are separately constructed and combined together by the region switch layer and attribute relation layer for final attribute classification. A multi-net learning method and hint-based model compression are further proposed to get an effective localization model and a compact classification model, respectively. Our approach achieves significantly better performance than state-of-the-art methods on unaligned CelebA dataset, reducing the classification error by 30.9% In the final part of this dissertation, we propose an Expression Generative Adversarial Network (ExprGAN) for photo-realistic facial expression editing with controllable expression intensity. An expression controller module is specially designed to learn an expressive and compact expression code in addition to the encoder-decoder network. This novel architecture enables the expression intensity to be continuously adjusted from low to high. We further show that our ExprGAN can be applied for other tasks, such as expression transfer, image retrieval, and data augmentation for training improved face expression recognition models. To tackle the small size of the training database, an effective incremental learning scheme is proposed. Quantitative and qualitative evaluations on the widely used Oulu-CASIA dataset demonstrate the effectiveness of ExprGAN.en_US
dc.identifierhttps://doi.org/10.13016/mzuy-5wts
dc.identifier.urihttp://hdl.handle.net/1903/26539
dc.language.isoenen_US
dc.subject.pqcontrolledElectrical engineeringen_US
dc.subject.pqcontrolledArtificial intelligenceen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledComputer Visionen_US
dc.subject.pquncontrolledDeep Learningen_US
dc.subject.pquncontrolledFacial Attributesen_US
dc.subject.pquncontrolledFacial Expression Editingen_US
dc.subject.pquncontrolledFacial Expression Recognitionen_US
dc.subject.pquncontrolledTransfer Learningen_US
dc.titleFACIAL EXPRESSION RECOGNITION AND EDITING WITH LIMITED DATAen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ding_umd_0117E_20973.pdf
Size:
13.57 MB
Format:
Adobe Portable Document Format