Machine Learning of Facial Attributes Using Explainable, Secure and Generative Adversarial Networks
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
"Attributes" are referred to abstractions that humans use to group entities and phenomena that have a common characteristic. In machine learning (ML), attributes are fundamental because they bridge the semantic gap between humans and ML systems. Thus, researchers have been using this concept to transform complicated ML systems into interactive ones. However, training the attribute detectors which are central to attribute-based ML systems can still be challenging. It might be infeasible to gather attribute labels for rare combinations to cover all the corner cases, which can result in weak detectors. Also, it is not clear how to fill in the semantic gap with attribute detectors themselves. Finally, it is not obvious how to interpret the detectors' outputs in the presence of adversarial noise.
First, we investigate the effectiveness of attributes for bridging the semantic gap in complicated ML systems. We turn a system that does continuous authentication of human faces on mobile phones into an interactive attribute-based one. We employ deep multi-task learning in conjunction with multi-view classification using facial parts to tackle this problem. We show how the proposed system decomposition enables efficient deployment of deep networks for authentication on mobile phones with limited resources.
Next, we seek to improve the attribute detectors by using conditional image synthesis. We take a generative modeling approach for manipulating the semantics of a given image to provide novel examples. Previous works condition the generation process on binary attribute existence values. We take this type of approaches one step further by modeling each attribute as a distributed representation in a vector space. These representations allow us to not only toggle the presence of attributes but to transfer an attribute style from one image to the other. Furthermore, we show diverse image generation from the same set of conditions, which was not possible using existing methods with a single dimension per attribute.
We then investigate filling in the semantic gap between humans and attribute classifiers by proposing a new way to explain the pre-trained attribute detectors. We use adversarial training in conjunction with an encoder-decoder model to learn the behavior of binary attribute classifiers. We show that after our proposed model is trained, one can see which areas of the image contribute to the presence/absence of the target attribute, and also how to change image pixels in those areas so that the attribute classifier decision changes in a consistent way with human perception.
Finally, we focus on protecting the attribute models from un-interpretable behaviors provoked by adversarial perturbations. These behaviors create an inexplainable semantic gap since they are visually unnoticeable. We propose a method based on generative adversarial networks to alleviate this issue. We learn the training data distribution that is used to train the core classifier and use it to detect and denoise test samples. We show that the method is effective for defending facial attribute detectors.