Aligning AI with Human Values: A Path Towards Trustworthy Machine Learning Systems
| dc.contributor.advisor | Huang, Furong | en_US |
| dc.contributor.author | Xu, Yuancheng | en_US |
| dc.contributor.department | Applied Mathematics and Scientific Computation | en_US |
| dc.contributor.publisher | Digital Repository at the University of Maryland | en_US |
| dc.contributor.publisher | University of Maryland (College Park, Md.) | en_US |
| dc.date.accessioned | 2025-08-08T11:53:18Z | |
| dc.date.issued | 2025 | en_US |
| dc.description.abstract | Machine learning has become a powerful tool for harnessing vast amounts of data across diverse applications. However, as artificial intelligence (AI) technologies advance and become more deeply integrated into daily life, they also introduce risks such as malicious exploitation, misinformation, and unfair decision-making, which can undermine their reliability and ethical integrity. Given AI’s growing influence, ensuring that these systems are trustworthy and aligned with human values is essential for their responsible and safe deployment. To address these challenges, this dissertation investigates trustworthiness across the AI pipeline, focusing on training-time vulnerabilities, inference-time robustness and alignment, and the long-term impacts of decision-making models. At the training stage, it examines how manipulated training data can compromise vision-language models, facilitating the spread of coherent misinformation. At the inference stage, it develops methods to enhance adversarial robustness in image classifiers and align frozen language models with human values at test time through reward guidance. For the long-term impact, it formulates fairness in sequential decision-making and proposes strategies to mitigate bias accumulation over time. Together, this dissertation aims to provide a holistic framework for improving AI reliability, safety, and fairness, fostering more trustworthy and responsible AI deployment. | en_US |
| dc.identifier | https://doi.org/10.13016/ygyc-ex7e | |
| dc.identifier.uri | http://hdl.handle.net/1903/34142 | |
| dc.language.iso | en | en_US |
| dc.subject.pqcontrolled | Computer science | en_US |
| dc.subject.pqcontrolled | Applied mathematics | en_US |
| dc.subject.pquncontrolled | adversarial learning | en_US |
| dc.subject.pquncontrolled | foundation models | en_US |
| dc.subject.pquncontrolled | large language model alignment | en_US |
| dc.subject.pquncontrolled | robustness | en_US |
| dc.subject.pquncontrolled | trustworthy machine learning | en_US |
| dc.title | Aligning AI with Human Values: A Path Towards Trustworthy Machine Learning Systems | en_US |
| dc.type | Dissertation | en_US |
Files
Original bundle
1 - 1 of 1