Adversarial Vulnerabilities of Deep Networks and Affordable Robustness
Goldstein, Thomas A
MetadataShow full item record
Deep learning has improved the performance of many computer vision tasks. However, the features that are learned without extra regularization are not necessarily interpretable. While in terms of generalization, conventionally trained models seem to perform really well, they are susceptible to certain failure modes. Eventhough these catastrophic failure cases rarely happen naturally, an adversary can engineer them by having some knowledge about the design process. Based on the time that the adversary manipulates the system, we can classify threats into evasion attacks or data poisoning attacks. First, we will cover a recently proposed data poisoning threat model that does not assume that the adversary has control over the labeling process. We call this attack the “targeted clean-label” poisoning attack. The proposed attack successfully causes misclassification of a target instance both under end-to-end training and transfer learning scenarios without degrading the overall performance of the classifier on non-target examples. We will then shift our focus to evasion attacks. We will consider two types of inference-time attacks: universal perturbations, and adversarial examples. For universal perturbations, we present an efficient method for perturbation generation. We also propose universal adversarial training for defending against universal perturbations. In the last part of this dissertation, we will present methods for training per-instance robust models in settings where we have limited resources. One case of limited resources is the scarcity of computing power. In this case, we will present our algorithm called “Adversarial Training for Free!” which enables us to train robust models with the same computational cost of conventional/natural training. We achieve this efficiency by simultaneously updating the network parameters and the adversarial perturbation. Another case of limited resources is availability of training data per-class. For this case, we introduce adversarially robust transfer learning.