ENHANCING TRUSTWORTHINESS AND SAFETY IN FOUNDATION MODELS

Wu, Yihan

ENHANCING TRUSTWORTHINESS AND SAFETY IN FOUNDATION MODELS

dc.contributor.advisor	Huang, Heng	en_US
dc.contributor.author	Wu, Yihan	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2026-01-27T06:37:13Z
dc.date.issued	2025	en_US
dc.description.abstract	The rapid progress of foundation models has driven breakthroughs in computer vision, language, and speech generation. Yet, their widespread deployment also introduces critical challenges of trustworthiness, robustness, and safety. This dissertation advances theoretical foundations and practical techniques to enhance the reliability of foundation models across classification and multi-modal generation tasks. In the first part, we focus on classification. We introduce RetrievalGuard, the first provably robust method for 1-nearest neighbor image retrieval, ensuring resistance against adversarial manipulation. We further propose adversarial weight perturbation to improve the generalization of graph neural networks under adversarial conditions, and develop a law of robustness beyond Isoperimetry to establish a new theoretical framework for understanding robustness guarantees. The second part addresses trustworthiness in language-based generation. We design resilient watermarking techniques that preserve the statistical distribution of large language models while ensuring accessibility and detectability, including a distribution-preserving watermark and an unbiased watermark framework. We also study the vulnerabilities of these systems through De-mark, a systematic watermark removal attack, highlighting critical risks and guiding future defense design. The third part extends trustworthiness to multi-modal generation. We propose watermarking schemes tailored to order-agnostic language models and auto-regressive speech generation models, bridging theoretical guarantees with practical imperceptibility. In particular, we demonstrate robust and distortion-free watermarks for speech generation, marking one of the first principled approaches to secure audio foundation models against misuse. Together, these contributions form a comprehensive agenda for enhancing the trust- worthiness and safety of foundation models. By unifying robustness theory with practical watermarking, this dissertation provides both provable insights and deployable mechanisms, advancing the development of responsible and reliable AI systems.	en_US
dc.identifier	https://doi.org/10.13016/dt7q-os0o
dc.identifier.uri	http://hdl.handle.net/1903/35044
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pquncontrolled	Large Language Models	en_US
dc.subject.pquncontrolled	Machine Learning	en_US
dc.subject.pquncontrolled	Watermarking	en_US
dc.title	ENHANCING TRUSTWORTHINESS AND SAFETY IN FOUNDATION MODELS	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Wu_umd_0117E_25625.pdf
Size:: 4.1 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations