ENHANCING TRUSTWORTHINESS AND SAFETY IN FOUNDATION MODELS

dc.contributor.advisorHuang, Hengen_US
dc.contributor.authorWu, Yihanen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2026-01-27T06:37:13Z
dc.date.issued2025en_US
dc.description.abstractThe rapid progress of foundation models has driven breakthroughs in computer vision, language, and speech generation. Yet, their widespread deployment also introduces critical challenges of trustworthiness, robustness, and safety. This dissertation advances theoretical foundations and practical techniques to enhance the reliability of foundation models across classification and multi-modal generation tasks. In the first part, we focus on classification. We introduce RetrievalGuard, the first provably robust method for 1-nearest neighbor image retrieval, ensuring resistance against adversarial manipulation. We further propose adversarial weight perturbation to improve the generalization of graph neural networks under adversarial conditions, and develop a law of robustness beyond Isoperimetry to establish a new theoretical framework for understanding robustness guarantees. The second part addresses trustworthiness in language-based generation. We design resilient watermarking techniques that preserve the statistical distribution of large language models while ensuring accessibility and detectability, including a distribution-preserving watermark and an unbiased watermark framework. We also study the vulnerabilities of these systems through De-mark, a systematic watermark removal attack, highlighting critical risks and guiding future defense design. The third part extends trustworthiness to multi-modal generation. We propose watermarking schemes tailored to order-agnostic language models and auto-regressive speech generation models, bridging theoretical guarantees with practical imperceptibility. In particular, we demonstrate robust and distortion-free watermarks for speech generation, marking one of the first principled approaches to secure audio foundation models against misuse. Together, these contributions form a comprehensive agenda for enhancing the trust- worthiness and safety of foundation models. By unifying robustness theory with practical watermarking, this dissertation provides both provable insights and deployable mechanisms, advancing the development of responsible and reliable AI systems.en_US
dc.identifierhttps://doi.org/10.13016/dt7q-os0o
dc.identifier.urihttp://hdl.handle.net/1903/35044
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledLarge Language Modelsen_US
dc.subject.pquncontrolledMachine Learningen_US
dc.subject.pquncontrolledWatermarkingen_US
dc.titleENHANCING TRUSTWORTHINESS AND SAFETY IN FOUNDATION MODELSen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wu_umd_0117E_25625.pdf
Size:
4.1 MB
Format:
Adobe Portable Document Format