Towards Trustworthy Machine Learning Systems
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Machine learning (ML) has achieved rapid progress in recent years. At the same time, the growing capability of AI/ML also raises concerns over its reliability and societal impacts. This dissertation investigates the robustness and trustworthiness of a spectrum of ML systems: from discriminative to generative models, from model training to data collection, and from vision to language. It covers three aspects of trustworthy AI/ML: Reliability under data distribution shifts, interpretability, and data security.First, reliability under data distribution shifts is essential for ML models to be reliably deployed in real-world scenarios. Multiple factors can affect the data distribution at test time, which is unknown during training. To account for unknown distribution shifts at test time, this dissertation first studies several data augmentation techniques that are inspired by adversarial optimization. We start with an image-space data augmentation solution and then extend the idea to the feature representation space. In addition, we look into multi-modal vision-language models and propose to address the limitations in the textual context to improve their zero-shot generalization.
Second, interpretability is a desired property of trustworthy ML systems, which makes black-box ML models more accountable. In addition, interpretability tools also help practitioners debug and improve their models. Saliency maps are a commonly used technique for visualizing how much each input part contributes to an ML model's output. In this dissertation, we broaden the idea of saliency maps by proposing a parameter-space saliency method that helps identify how much each sub-model contributes to an ML model's output and discuss several applications of this saliency tool for identifying and correcting failure modes of off-the-shelf ML models.
Lastly, we study a more adverse scenario where ML systems are at risk of being attacked or compromised by adversaries. With the rapid progress of foundation models, especially large language models (LLMs), we see a spike in interest in ML safety and ethical concerns around their applications. With the generative ability of recent large models, the potential misuse becomes more detrimental. In this dissertation, we delve into possible exploitations of LLMs via data tampering and discuss their societal impacts.