A Multifaceted Quantification of Bias in Large Language Models
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Language models are rapidly developing, demonstrating impressive capabilities in comprehending, generating, and manipulating text. As they advance, they unlock diverse applications across various domains and become increasingly integrated into our daily lives. Nevertheless, these models, trained on vast and unfiltered datasets, come with a range of potential drawbacks and ethical issues. One significant concern is the potential amplification of biases present in the training data, generating stereotypes and reinforcing societal injustices when language models are deployed. In this work, we propose methods to quantify biases in large language models. We examine stereotypical associations for a wide variety of social groups characterized by both single and intersectional identities. Additionally, we propose a framework for measuring stereotype leakage across different languages within multilingual large language models. Finally, we introduce an algorithm that allows us to optimize human data collection in conditions of high levels of human disagreement.