Aligning AI with Human Values: A Path Towards Trustworthy Machine Learning Systems

Xu, Yuancheng

Aligning AI with Human Values: A Path Towards Trustworthy Machine Learning Systems

dc.contributor.advisor	Huang, Furong	en_US
dc.contributor.author	Xu, Yuancheng	en_US
dc.contributor.department	Applied Mathematics and Scientific Computation	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2025-08-08T11:53:18Z
dc.date.issued	2025	en_US
dc.description.abstract	Machine learning has become a powerful tool for harnessing vast amounts of data across diverse applications. However, as artificial intelligence (AI) technologies advance and become more deeply integrated into daily life, they also introduce risks such as malicious exploitation, misinformation, and unfair decision-making, which can undermine their reliability and ethical integrity. Given AI’s growing influence, ensuring that these systems are trustworthy and aligned with human values is essential for their responsible and safe deployment. To address these challenges, this dissertation investigates trustworthiness across the AI pipeline, focusing on training-time vulnerabilities, inference-time robustness and alignment, and the long-term impacts of decision-making models. At the training stage, it examines how manipulated training data can compromise vision-language models, facilitating the spread of coherent misinformation. At the inference stage, it develops methods to enhance adversarial robustness in image classifiers and align frozen language models with human values at test time through reward guidance. For the long-term impact, it formulates fairness in sequential decision-making and proposes strategies to mitigate bias accumulation over time. Together, this dissertation aims to provide a holistic framework for improving AI reliability, safety, and fairness, fostering more trustworthy and responsible AI deployment.	en_US
dc.identifier	https://doi.org/10.13016/ygyc-ex7e
dc.identifier.uri	http://hdl.handle.net/1903/34142
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pqcontrolled	Applied mathematics	en_US
dc.subject.pquncontrolled	adversarial learning	en_US
dc.subject.pquncontrolled	foundation models	en_US
dc.subject.pquncontrolled	large language model alignment	en_US
dc.subject.pquncontrolled	robustness	en_US
dc.subject.pquncontrolled	trustworthy machine learning	en_US
dc.title	Aligning AI with Human Values: A Path Towards Trustworthy Machine Learning Systems	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Xu_umd_0117E_24997.pdf
Size:: 13.35 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations
Mathematics Theses and Dissertations