ROBUSTNESS AND UNDERSTANDABILITY OF DEEP MODELS

Loading...
Thumbnail Image

Publication or External Link

Date

2022

Citation

Abstract

Deep learning has made a considerable leap in the past few decades, from promising models for solving various problems to becoming state-of-the-art. However, unlike classical machine learning models, it is sometimes difficult to explain why and how deep learning models make decisions. It is also interesting that their performance can drop with small amounts of noise. In short, deep learning models are well-performing, easily corrupted, hard-to-understand models that beat human beings in many tasks. Consequently, improving these deep models requires a deep understanding.

While deep learning models usually generalize well on unseen data, adding negligible amounts of noise to their input can flip their decision. This interesting phenomenon is known as "adversarial attacks." In this thesis, we study several defense methods against such adversarial attacks. More specifically, we focus on defense methods that, unlike traditional methods, use less computation or fewer training examples. We also show that despite the improvements in adversarial defenses, even provable certified defenses can be broken. Moreover, we revisit regularization to improve adversarial robustness.

Over the past years, many techniques have been developed for understanding and explaining how deep neural networks make a decision. This thesis introduces a new method for studying the building blocks of neural networks' decisions. First, we introduce the Plug-In Inversion, a new method for inverting and visualizing deep neural network architectures, including Vision Transformers. Then we study the features a ViT learns to make a decision. We compare these features when the network trains on labeled data versus when it uses a language model's supervision for training, such as in CLIP. Last, we introduce feature sonification, which borrows feature visualization techniques to study models trained for speech recognition (non-vision) tasks.

Notes

Rights