Evolving Language Patterns on the Web: Community Influence, Model Adaptation, and Bias Mitigation
dc.contributor.advisor | Ai, Wei | en_US |
dc.contributor.author | Zhou, Yuhang | en_US |
dc.contributor.department | Information Studies | en_US |
dc.contributor.publisher | Digital Repository at the University of Maryland | en_US |
dc.contributor.publisher | University of Maryland (College Park, Md.) | en_US |
dc.date.accessioned | 2025-08-08T12:08:37Z | |
dc.date.issued | 2025 | en_US |
dc.description.abstract | The language of the Web is in constant evolution, shaped by the dynamic interplay between social interaction, symbolic innovation, and shifting cultural norms. Online communities actively drive this evolution by introducing novel expressions, reinterpreting existing tokens, and constructing meanings that challenge traditional linguistic assumptions. Among the most illustrative of these transformations are emojis—visual symbols whose interpretations vary widely across users and contexts. As emojis evolve from standardized Unicode definitions into contextually rich social symbols, they reveal the complexity and fluidity of digital communication. However, this rapid pace of linguistic change presents major challenges for both human communication and natural language processing (NLP) systems, which struggle to adapt to the semantic drift of tokens over time. This dissertation investigates the interconnections between language evolution, social meaning construction, and computational modeling. It centers on three key areas that reflect different facets of this linguistic transformation. First, we examine the diffusion and semantic adaptation of newly introduced emojis in digital discourse. By analyzing usage patterns and leveraging large language models (LLMs), we develop an interpretation framework to decode the evolving meanings of new emojis and assess their impact on downstream NLP tasks. Second, we explore how biased associations embedded in training data lead to spurious correlations at the concept level. We demonstrate that LLMs tend to internalize these associations, which can skew their predictions and reinforce societal stereotypes. By identifying the mechanisms behind such biases, we highlight the importance of mitigating shortcut learning in both pre-training and fine-tuning stages. Third, we investigate how emojis, originally designed for neutral or positive expression, are repurposed for offensive communication. We develop a multi-step LLM-based pipeline to identify and replace offensive emojis in social media content while preserving the original semantic intent. Our human evaluations demonstrate that this approach reduces perceived offensiveness without sacrificing clarity or meaning. Together, these three investigations provide a comprehensive account of how language evolves in digital environments—and how NLP systems can better keep pace. Our findings underscore the need for adaptive, socially aware computational frameworks that account for linguistic fluidity, community-specific conventions, and evolving symbolic practices. By aligning NLP models more closely with the dynamics of human communication, this dissertation contributes to the development of more inclusive, responsive, and semantically grounded language technologies. | en_US |
dc.identifier | https://doi.org/10.13016/w2ze-uzal | |
dc.identifier.uri | http://hdl.handle.net/1903/34231 | |
dc.language.iso | en | en_US |
dc.subject.pqcontrolled | Information science | en_US |
dc.subject.pqcontrolled | Communication | en_US |
dc.subject.pqcontrolled | Computer science | en_US |
dc.subject.pquncontrolled | Computational Social Science | en_US |
dc.subject.pquncontrolled | Large Language Models | en_US |
dc.subject.pquncontrolled | Natural Language Processing | en_US |
dc.subject.pquncontrolled | Social Media Mining | en_US |
dc.title | Evolving Language Patterns on the Web: Community Influence, Model Adaptation, and Bias Mitigation | en_US |
dc.type | Dissertation | en_US |
Files
Original bundle
1 - 1 of 1