Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    Towards Effective and Inclusive AI: Aligning AI Systems with User Needs and Stakeholder Values Across Diverse Contexts
    (2024) Cao, Yang; Daumé III, Hal; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Inspired by the Turing test, a long line of research in AI has focused on technical improvement on tasks thought to require human-like comprehension. However, this focus has often resulted in models with impressive technical capabilities but uncertain real-world applicability. Despite the advancements of large pre-trained models, we still see various failure cases towards discriminated groups and when applied to specific applications. A major problem here is the detached model development process — these models are designed, developed, and evaluated with limited consideration of their users and stakeholders. My dissertation is dedicated to addressing this detachment by examining how artificial intelligence (AI) systems can be more effectively aligned with the needs of users and the values of stakeholders across diverse contexts. This workaims to close the gap between the current state of AI technology and its meaningful application in the lives of real-life stakeholders. My thesis explores three key aspects of aligning AI systems with human needs and values: identifying sources of misalignment, addressing the needs of specific user groups, and ensuring value alignment across diverse stakeholders. First, I examine potential causes of misalignment in AI system development, focusing on gender biases in natural language processing (NLP) systems. I demonstrate that without careful consideration of real-life stakeholders, AI systems are prone to biases entering at each development stage. Second, I explore the alignment of AI systems for specific user groups by analyzing two real-life application contexts: a content moderation assistance system for volunteer moderators and a visual question answering (VQA) system for blind and visually impaired (BVI) individuals. In both contexts, I identify significant gaps in AI systems and provide directions for better alignment with users’ needs. Finally, I assess the alignment of AI systems with human values, focusing on stereotype issues within general large language models (LLMs). I propose a theory-grounded method for systematically evaluating stereotypical associations and exploring their impact on diverse user identities, including intersectional identity stereotypes and the leakage of stereotypes across cultures. Through these investigations, this dissertation contributes to the growing field of human-centered AI by providing insights, methodologies, and recommendations for aligning AI systems with the needs and values of diverse stakeholders. By addressing the challenges of misalignment, user-specific needs, and value alignment, this work aims to foster the development of AI technologies that effectively collaborate with and empower users while promoting fairness, inclusivity, and positive social impact.
  • Thumbnail Image
    Item
    OPTIMIZING THE ACCURACY OF LIGHTWEIGHT METHODS FOR SHORT READ ALIGNMENT AND QUANTIFICATION
    (2021) Zakeri, Mohsen; Patro, Rob; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The analysis of the high throughput sequencing (HTS) data includes a number of involved computational steps, ranging from the assembly of transcriptome, mapping or alignment of the reads to existing or assembled sequences, estimating the abundance of sequenced molecules, performing differential or comparative analysis between samples, and even inferring dynamics of interest from snapshot data. Many methods have been developed for these different tasks that provide various trade-offs in terms of accuracy and speed, because accuracy and robustness typically come at the expense of sacrificing speed and vice versa. In this work, I focus on the problems of alignment and quantification of RNA-seq data, and review different aspects of the available methods for these problems. I explore finding a reasonable balance between these competing goals, and introduce methods that provide accurate results without sacrificing speed. Alignment of sequencing reads to known reference sequences is a challenging computational step in the RNA-seq pipeline mainly because of the large size of sample data and reference sequences, and highly-repetitive sequence. Recently, the concept of lightweight alignment is introduced to accelerate the mapping step of abundance estimation.I collaborated with my colleagues to explore some of the shortcomings of the lightweight alignment methods, and to address those with a new approach called the selective-alignment. Moreover, we introduce an aligner, Puffaligner, which benefits from both the indexing approach introduced by the Pufferfish index and also selective-alignment to produce accurate alignments in a short amount of time compared to other popular aligners. To improve the speed of RNA-seq quantification given a collection of alignments, some tools group fragments (reads) into equivalence classes which are sets of fragments that are compatible with the same subset of reference sequences. Summarizing the fragments into equivalence classes factorizes the likelihood function being optimized and increases the speed of the typical optimization algorithms deployed. I explore how this factorization affects the accuracy of abundance estimates, and propose a new factorization approach that demonstrates higher fidelity to the non-approximate model. Finally, estimating the posterior distribution of the transcript expressions is a crucial step in finding robust and reliable estimates of transcript abundance in the presence of high levels of multi-mapping. To assess the accuracy of their point estimates, quantification tools generate inferential replicates using techniques such as Bootstrap sampling and Gibbs sampling. The utility of inferential replicates has been portrayed in different downstream RNA-seq applications, i.e., performing differential expression analysis. I explore how sampling from both observed and unobserved data points (reads) improves the accuracy of Bootstrap sampling. I demonstrate the utility of this approach in estimating allelic expression with RNA-seq reads, where the absence of unique mapping reads to reference transcripts is a major obstacle for calculating robust estimates.