Aligning AI with Human Values: A Path Towards Trustworthy Machine Learning Systems

dc.contributor.advisorHuang, Furongen_US
dc.contributor.authorXu, Yuanchengen_US
dc.contributor.departmentApplied Mathematics and Scientific Computationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2025-08-08T11:53:18Z
dc.date.issued2025en_US
dc.description.abstractMachine learning has become a powerful tool for harnessing vast amounts of data across diverse applications. However, as artificial intelligence (AI) technologies advance and become more deeply integrated into daily life, they also introduce risks such as malicious exploitation, misinformation, and unfair decision-making, which can undermine their reliability and ethical integrity. Given AI’s growing influence, ensuring that these systems are trustworthy and aligned with human values is essential for their responsible and safe deployment. To address these challenges, this dissertation investigates trustworthiness across the AI pipeline, focusing on training-time vulnerabilities, inference-time robustness and alignment, and the long-term impacts of decision-making models. At the training stage, it examines how manipulated training data can compromise vision-language models, facilitating the spread of coherent misinformation. At the inference stage, it develops methods to enhance adversarial robustness in image classifiers and align frozen language models with human values at test time through reward guidance. For the long-term impact, it formulates fairness in sequential decision-making and proposes strategies to mitigate bias accumulation over time. Together, this dissertation aims to provide a holistic framework for improving AI reliability, safety, and fairness, fostering more trustworthy and responsible AI deployment.en_US
dc.identifierhttps://doi.org/10.13016/ygyc-ex7e
dc.identifier.urihttp://hdl.handle.net/1903/34142
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pqcontrolledApplied mathematicsen_US
dc.subject.pquncontrolledadversarial learningen_US
dc.subject.pquncontrolledfoundation modelsen_US
dc.subject.pquncontrolledlarge language model alignmenten_US
dc.subject.pquncontrolledrobustnessen_US
dc.subject.pquncontrolledtrustworthy machine learningen_US
dc.titleAligning AI with Human Values: A Path Towards Trustworthy Machine Learning Systemsen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Xu_umd_0117E_24997.pdf
Size:
13.35 MB
Format:
Adobe Portable Document Format