Trust through Transparency: Towards Reliable AI for All

Loading...
Thumbnail Image

Publication or External Link

Date

Advisor

Feizi, Soheil

Citation

Abstract

Seemingly performant models can break down in unexpected and uneven ways, from image classifiers failing to recognize an otter out of water, to LLMs being nearly 3 times worse at recalling facts about Somalia than Sweden. In this dissertation, I’ll detail interpretability techniques to scalably illuminate and efficiently intervene on discovered model deficiencies. First, I’ll present evidence for pervasive reliance on spurious correlations by vision models, by way of carefully constructed benchmarks. Then, I’ll automate these approaches, demonstrating the power of leveraging auxiliary models to more efficiently organize data, towards uncovering and articulating subsets where models struggle. Finally, I’ll show how these same techniques can be applied to mitigate instances of real-world geographic disparities and even tackle sociotechnical challenges like artistic copyright infringement. In general, it can be difficult to trust what we do not fully understand, especially when unexpected failures arise. By scalably identifying failure modes before they cause harm, we enhance transparency around model abilities and limitations, thus better informing when models can be trusted to work reliably for all.

Notes

Rights