Understanding and Improving Reliability of Predictive and Generative Deep Learning Models
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Deep learning models are prone to acquiring spurious correlations and biases during training and adversarial attacks during inference. In the context of predictive models, this results in inaccurate predictions relying on spurious features. Our research delves into this phenomenon specifically concerning objects placed in uncommon settings, where they are not conventionally found in the real world (e.g., a plane on water or a television in a cave). We introduce the "FOCUS: Familiar Objects in Common and Uncommon Settings" dataset which aims to stress-test the generalization capabilities of deep image classifiers. By leveraging the power of modern search engines, we deliberately gather data containing objects in common and uncommon settings in a wide range of locations, weather conditions, and time of day. Our comprehensive analysis of popular image classifiers on the FOCUS dataset reveals a noticeable decline in performance when classifying images in atypical scenarios. FOCUS only consists of natural images which are extremely challenging to collect as by definition it is rare to find objects in unusual settings. To address this challenge, we introduce an alternative dataset named Diffusion Dreamed Distribution Shifts (D3S). D3S comprises synthetic images generated through StableDiffusion, utilizing text prompts and image guides derived from placing a sample foreground image onto a background template image. This scalable approach allows us to create 120,000 images featuring objects from all 1000 ImageNet classes set against 10 diverse backgrounds. Due to the incredible photorealism of the diffusion model, our images are much closer to natural images than previous synthetic datasets.
To alleviate this problem, we propose two methods of learning richer and more robust image representations. In the first approach, we harness the foreground and background labels within D3S to learn a foreground (background)representation resistant to changes in background (foreground). This is achieved by penalizing the mutual information between the foreground (background) features and the background (foreground) labels. We demonstrate the efficacy of these representations by training classifiers on a task with strong spurious correlations. Thus far, our focus has centered on predictive models, scrutinizing the robustness of the learned object representations, particularly when the contextual surroundings are unconventional. In the second approach, we propose to use embeddings of objects and their relationships extracted using off-the-shelf image segmentation models and text encoders respectively as input tokens to a transformer. This leads to remarkably richer features that improve performance on downstream tasks such as image retrieval.
Large language models are also prone to failures during inference. Given the widespread use of LLMs, understanding the propensity of these models to fail given adversarial inputs is crucial. To that end we propose a series of fast adversarial attacks called BEAST that uses beam search to add adversarial tokens to a given input prompt. These attacks induce hallucination, cause the models to jailbreak and facilitate unintended membership inference from model outputs. Our attacks are fast and are executable in relatively compute constrained environments.