Computer Science Theses and Dissertations

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 1001
  • Item
    AI Empowered Music Education
    (2024) Shrestha, Snehesh; Aloimonos, Yiannis; Fermüller, Cornelia; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Learning a musical instrument is a complex process involving years of practice and feedback. However, dropout rates in music programs, particularly among violin students, remain high due to socio-economic barriers and the challenge of mastering the instrument. This work explores the feasibility of accelerating learning and leveraging technology in music education, with a focus on bowed string instruments, specifically the violin. My research identifies workflow gaps and challenges for the stakeholders, aiming to address not only the improvement of learning outcomes but also the provision of opportunities for socioeconomically challenged students. Three key areas are emphasized: designing user studies and creating a comprehensive violin dataset, developing tools and deep learning algorithms for accurate performance assessment, and crafting a practice platform for student feedback. Three fundamental perspectives were essential: a) understanding the stakeholders and their specific challenges, b) understanding how the instrument operates and what actions the player must master to control its functions, and c) addressing the technical challenges associated with constructing and implementing detection and feedback systems. The existing datasets were inadequate for analyzing violin playing, primarily due to their lack of diversity of body types and skill levels, as well as the absence of well-synchronized and calibrated video data, along with corresponding ground truth 3D poses and musical events. Our experiment design was ensured that the collected data would be suitable for subsequent tasks downstream. These considerations played a significant role in determining the metrics used to evaluate the accuracy of the data and the success metrics for the subsequent tasks. At the foundation of movement analysis lies 3D human pose estimation. Unfortunately, the current state-of-the-art algorithms face challenges in accurately estimating monocular 3D poses during instrument playing. These challenges arise from factors such as occlusions, partial views, human-object interactions, limited viewing angles, pixel density, and camera sampling rates. To address these issues, we developed a novel 3D pose estimation algorithm based on the insight that the music produced by the violin is a direct result of the corresponding motions. Our algorithm integrates visual observations with audio inputs to generate precise, high-resolution 3D pose estimates that are temporally consistent and conducive to downstream tasks. Providing effective feedback to learners is a nuanced process that requires balancing encouragement with challenge. Without a user-friendly interface and a motivational strategy, feedback runs the risk of being counterproductive. While current systems excel at detecting pitch and temporal misalignments and visually displaying them for analysis, they often overwhelm players. In this dissertation, we introduce two novel feedback systems. The first is a visual-haptic feedback system that overlays simple augmented cues on the user's body, gently guiding them back to the correct posture. The second is a haptic band synchronized with the music, enhancing students' perception of rhythmic timing and bowing intensities. Additionally, we developed an intuitive user interface for real-time feedback during practice sessions and performance reviews. This data can be shared with teachers for deeper insights into students' struggles and track progress. This research aims to empower both students and teachers. By providing students with feedback during individual practice sessions and equipping teachers with tools to monitor and tailor AI interventions according to their preferences, this work serves as a valuable teaching assistant. By addressing tasks that teachers may not prefer or physically perform, such as personalized feedback and progress tracking, this research endeavors to democratize access to high-quality music education and mitigate dropout rates in music programs.
  • Item
    Approximate Nearest Neighbor Search with Filters
    (2024) Landrum, Benjamin Thomas; Dhulipala, Laxman; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Approximate nearest neighbor search (ANNS) on high-dimensional vectors is a fundamental primitive used widely for search over neural embeddings of unstructured data. Prior work on ANNS has produced indices which provide fast and accurate search on datasets up to billions of points, but are not well suited to queries restricted to some subset of the original dataset. Filtered ANNS is a formulation of the problem which adds metadata to points in the dataset which can be used to filter points at query time. This setting requires indexing a dataset in a metadata-aware way to support filtered queries. Filtered ANNS is important for applications such as product and image search, and necessary to give recently popular `vector databases' functionality similar to more traditional tabular databases. This work concerns two versions of the filtered ANNS problem. The most popular formulation in prior work associates points with boolean metadata in the form of labels and filters queries using a boolean predicate on these labels. In this setting, we present a novel index with state-of-the-art performance for queries with filters requiring either one label or both of a pair of labels which won a large benchmarking competition's track focused on the problem. We also introduce a novel formulation of filtered ANNS called `window filtered' ANNS, in which points are associated with a continuous metadata value (in practical use, this corresponds to a timestamp, measure of popularity, etc.), and queries are filtered to a range of metadata values. In addition to describing the problem, we present a practical and theoretically motivated index which handily outperforms baselines.
  • Item
    A Framework for Benchmarking Graph-Based Artificial Intelligence
    (2024) O'Sullivan, Kent Daniel; Regli, William C; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Graph-based Artificial Intelligence (GraphAI) encompasses AI problems formulated using graphs, operating on graphs, or relying on graph structures for learning. Contemporary Artificial Intelligence (AI) research explores how structured knowledge from graphs can enhance existing approaches to meet the real world’s demands for transparency, explainability, and performance. Characterizing GraphAI performance is challenging because different combinations of graph abstractions, representations, algorithms, and hardware acceleration techniques can trigger unpredictable changes in efficiency. Although benchmarks enable testing different GraphAI implementations, most cannot currently capture the complex interaction between effectiveness and efficiency, especially across dynamic knowledge graphs. This work proposes an empirical ‘grey-box’ approach to GraphAI benchmarking, providing a method that enables experimentally trading between effectiveness and efficiency across different combinations of graph abstractions, representations, algorithms, and hardware accelerators. A systematic literature review yields a taxonomy of GraphAI tasks and a collection of intelligence and security problems that interact with GraphAI . The taxonomy and problem survey guide the development of a framework that fuses empirical computer science with constraint theory in an approach to benchmarking that does not require invasive workload analyses or code instrumentation. We formalize a methodology for developing problem-centric GraphAI benchmarks and develop a tool to create graphs from OpenStreetMaps data to fill a gap in real-world mesh graph datasets required for benchmark inputs. Finally, this work provides a completed benchmark for the Population Segmentation Intelligence and Security problem developed using the GraphAI benchmark problem development methodology. It provides experimental results that validate the utility of the GraphAI benchmark framework for evaluating if, how, and when GraphAI acceleration should be applied to the population segmentation problem.
  • Item
    Optimal Point-Spread-Function Engineering with Dynamic Optics and Event Cameras
    (2024) Shah, Sachin; Metzler, Christopher A; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Computational imaging systems co-design optics and algorithms to observe phenomena beyond the reach of traditional cameras. Point-spread-function (PSF) engineering is a powerful technique wherein a custom phase mask is integrated into an optical system to encode additional information into captured images. Used in combination with deep learning, such systems now offer state-of-the-art performance at three-dimensional molecule localization, extended depth-of-field imaging, lensless imaging, and other tasks. Recent hardware breakthroughs are unlocking unprecedented ultrafast capabilities such as micro-electromechanical system based spatial light modulators will allow us to module light at kilohertz rates and neuromorphic event cameras will enable kilohertz lower-power and high-dynamic-range capture. Unfortunately, existing theories and algorithms are unable to fully harness these new capabilities. This work answers a natural question: Can one encode additional information and achieve superior performance by leveraging the ultrafast capabilities of spatial light modulators and event cameras. We first prove that the set of PSFs described by static phase masks is non-convex and that, as a result, time-averaged PSFs generated by dynamic phase masks displayed on a spatial light modulator are fundamentally more expressive. We then derive the theoretical limits on three-dimensional tracking with PSF-engineered event cameras. Using these bounds, we design new optimal phase masks and binary amplitude masks. We demonstrate the efficacy of our designs through extensive simulations and validate our method with a simple lab prototype.
  • Item
    A Comparative Study Of Outlier Detection Methods And Their Downstream Effects
    (2024) Adipudi, Vikram; Herrmann, Jeffrey W.; Systems Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    When fitting machine learning models on datasets there is a possibility of mistakes occurring with overfitting due to outliers in the dataset. Mistakes can lead to incorrect predictions from the model and could diminish the usefulness of the model. Outlier detection is conducted as a precursor step to avoid errors caused by this and to improve performance of the model. This study compares how different outlier detection methods impact regression, classification, and clustering methods. To identify which outlier detection performs best in conjunction with different tasks. To conduct this study multiple outlier detection algorithms were used to clean datasets and the cleaned data was fed into the models. The performance of the model with and without cleaning was compared to identify trends. This study found that using outlier detection of any kind will have little impact on supervised tasks such as regression and classification. For the unsupervised task different clustering models had outlier detection and removal algorithms that made the most positive impact in the clustering. Most commonly IForest and PCA had the greatest impact on clustering methods.
  • Item
    Interpretability of Deep Models Across different Architectures and Modalities
    (2024) Kazemitabaiezavare, Seyedhamidreza; Goldstein, Tom; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The exploration of deep models has been a persistent focus in research over time. Model inversion and feature visualization stand out as significant methods for deciphering the workings of deep models. They play a vital role in unraveling the inner workings of neural architectures, understanding their acquired knowledge, and elucidating their behaviors. This dissertation comprises three chapters dedicated to investigating deep models, particularly newly emerging ones such as Vision Transformers (ViTs) and CLIP, using model inversion and feature visualization techniques. In the first chapter, introducing Plugin Inversion, we introduce a model inversion method that, instead of relying on regularizers, relies on a set of augmentations, sidestepping the need for extensive hyperparameter tuning. We demonstrate the efficacy of our approach by applying it to invert Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and Multi-Layer Perceptrons (MLPs). The contributions of this work are summarized as follows: (I) We provide a detailed analysis of various augmentations and how they affect the quality of images produced via class inversion. (II) We introduce Plug-In Inversion (PII), a new class inversion technique based on these augmentations, and compare it to existing techniques. (III) We apply PII to dozens of pre-trained models of varying architecture, justifying our claim that it can be ‘plugged in’ to most networks without modification. (IV) In particular, we show that PII succeeds in inverting ViTs and large MLP-based architectures, which, to our knowledge, has not previously been accomplished. (V) Finally, we explore the potential for combining PII with prior methods. In the second chapter, following PII and the techniques introduced, we utilize model inversion to examine CLIP models. Unlike traditional classification models that are restricted to a predefined set of classes, CLIP models are free of that restriction, and we can apply model inversion to any choice of prompts. Our contributions in this chapter are as follows: (I) In recent years, generative models have shown the capability to blend concepts. We demonstrate that the same holds true for CLIP models, and the knowledge embedded inside CLIP models can blend concepts. (II) We demonstrate that through inversion, seemingly harmless prompts, such as celebrity names, can produce NSFW images. This is particularly true for women celebrities, who the CLIP model seems to strongly associate with sexual content. Certain identities, like “Dakota Johnson,” are close to many NSFW words in the embedding space. This may be problematic since the embeddings of CLIP models are being used in many text-to-image generative models. Addressing this issue requires more meticulous data curation during the training of large-scale models. (III) We demonstrate that CLIP models display gender bias in their knowledge through inversions applied to prompts related to professions and status. (IV) We investigate the scale of the training data on the quality of the inversions, and we show that more training data leads to better inversions. In chapter three, we delve into interpreting Vision Transformers using feature visualization. While feature visualizations have provided valuable insights into the workings of Convolutional Neural Networks (CNNs), these methods have struggled to interpret ViT representations due to their inherent complexity. Nevertheless, we demonstrate that with proper application to the appropriate representations, feature visualizations can be successful with ViTs. This newfound understanding enables us to delve visually into ViTs and the information they extract from images. The contributions of this chapter are as follows: (I) We observe that uninterpretable and adversarial behavior occurs when applying standard methods of feature visualization to the relatively low-dimensional components of transformer-based models, such as keys, queries, or values. However, applying these tools to the relatively high-dimensional features of the position-wise feedforward layer results in successful and informative visualizations. We conduct large-scale visualizations on a wide range of transformer-based vision models, including ViTs, DeiT, CoaT, ConViT, PiT, Swin, and Twin, to validate the effectiveness of our method. (II) We show that patch-wise image activation patterns for ViT features essentially behave like saliency maps, highlighting the regions of the image a given feature attends to. This behavior persists even for relatively deep layers, showing the model preserves the positional relationship between patches instead of using them as global information stores. (III) We compare the behavior of ViTs and CNNs, finding that ViTs make better use of background information and rely less on high-frequency, textural attributes. Both types of networks build progressively more complex representations in deeper layers and eventually contain features responsible for detecting distinct objects. (IV) We investigate the effect of natural language supervision with CLIP on the types of features extracted by ViTs. We find CLIP-trained models include various features clearly catered to detecting components of images corresponding to caption text, such as prepositions, adjectives, and conceptual categories.
  • Item
    MATHEMATICS OF THE DYNAMICS AND CONTROL OF THE SARS-COV-2 PANDEMIC
    (2024) Pant, Binod; Gumel, Abba B.; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The pneumonia-like illness that emerged late in 2019, caused by SARS-CoV-2 (and coined COVID-19), became the greatest public health challenge humans have faced since the 1918/1919 influenza pandemic, causing over 670 million confirmed cases and 7 million fatalities globally. This dissertation contributes in providing deep qualitative insights and understanding on the transmission dynamics and control of the pandemic, using mathematical modeling approaches together with data analytics and computation. Specifically, it addresses some of the pertinent challenges associated with modeling the dynamics of the disease, notably the disproportionate effect of the disease on certain (risk and demographic) populations (inducing various heterogeneities) and behavior changes with respect to adherence or lack thereof to interventions. An $m-$group model, which monitors the temporal dynamics of the disease in m heterogeneous populations, was designed and used to study the impact of age heterogeneity and vaccination on the spread of the disease in the United States. For instance, the disease-free equilibrium for the case of the model with m=1 (i.e., the model with a homogeneous population) was shown to be globally-asymptotically stable for two special cases (when vaccine is perfect or when disease-induced mortality is negligible) whenever the associated reproduction number of the homogeneous model is less than one. The homogeneous model has a unique endemic equilibrium whenever the reproduction threshold exceeds unity (this equilibrium was shown to be globally-asymptotically stable for a special case, using a nonlinear Lyapunov function of Goh-Volterra type). The homogeneous model was fitted to the observed cumulative mortality data for the SARS-CoV-2 pandemic in the United States during the period from January to May of 2022 (when Omicron was the predominant variant). It was shown that vaccine-derived herd immunity (needed to eliminate the disease) cannot be attained using the homogeneous model regardless of the proportion of individuals fully vaccinated. Such vaccine-derived immunity can, however, be achieved using the $m$-group heterogeneous model, with $m=2$ (where the total population is split into two groups: those under 65 years of age, and those 65 years and older), if at least 61\% of the susceptible population is fully vaccinated. Thus, this dissertation shows that heterogeneity reduces the level of vaccine coverage needed to eliminate the pandemic (and models that do not account for heterogeneity may be over-estimating the vaccination coverage needed to achieve herd immunity in the community). To quantify the impact of human behavior changes on the spread and control of the pandemic, we designed a novel behavior-epidemiology model which considers numerous metrics for inducing human behavior changes (such as current level of disease burden and intervention adherence fatigue). Unlike the equivalent model without human behavior explicitly incorporated, the behavior-epidemiology model fits the observed cumulative mortality and predicts the observed daily mortality data very well. It was also shown that the behavior metrics related to the level of SARS-CoV-2 mortality and symptomatic transmission were more influential in inducing positive behavior changes than all other behavior metrics considered. Finally, a model was developed to assess the utility of wastewater surveillance to study the transmission dynamics and control of SARS-CoV-2 in a community. Specifically, we developed and calibrated a wastewater-based epidemiology model using wastewater data from Miami-Dade county, Florida, during the third wave of the SARS-CoV-2 pandemic. The model showed a strong correlation between the observed (detected) weekly case data and the corresponding weekly data predicted by the calibrated model. The model's prediction of the week when maximum number of SARS-CoV-2 cases will be recorded in the county during the simulation period precisely matched the time when the maximum observed/reported cases were recorded (August 14, 2021). Furthermore, the model's projection of the maximum number of cases for the week of August 14, 2021 was about 15 times higher than the maximum observed weekly case count for the county on that day (i.e., the maximum case count estimated by the model was 15 times higher than the actual/observed count for confirmed cases). In addition to being in line with other modeling studies, this result is consistent with the CDC estimate that the reported confirmed case data may be 10 times lower than the actual (since the confirmed data did not account for asymptomatic and presymptomatic transmission). Furthermore, the model accurately predicted a one-week lag between the peak in weekly COVID-19 case and hospitalization data during the time period of the study in Miami-Dade, with the model-predicted hospitalizations peaking on August 21, 2021. Detailed time-varying global sensitivity analysis was carried out to determine the parameters (wastewater-based, epidemiological and biological) that have the most influence on the chosen response function (namely, the cumulative viral load in the wastewater). This analysis identified key parameters that significantly affect the value of the response function (hence, they should be targeted for intervention). This dissertation conclusively showed that wastewater surveillance data can be a very powerful indicator for measuring (i.e., providing early-warning signal and current burden) and predicting the future trajectory and burden (e.g., number of cases and hospitalizations) of emerging and re-emerging infectious diseases, such as SARS-CoV-2, in a community.
  • Item
    Quantum Algorithms for Nonconvex Optimization: Theory and Implementation
    (2024) Leng, Jiaqi; Wu, Xiaodi XW; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Continuous optimization problems arise in virtually all disciplines of quantitative research. While convex optimization has been well-studied in recent decades, large-scale nonconvex optimization problems remain intractable in both theory and practice. Quantum computers are expected to outperform classical computers in certain challenging computational problems. Some quantum algorithms for convex optimization have shown asymptotic speedups, while the quantum advantage for nonconvex optimization is yet to be fully understood. This thesis focuses on Quantum Hamiltonian Descent (QHD), a quantum algorithm for continuous optimization problems. A systematic study of Quantum Hamiltonian Descent is presented, including theoretical results concerning nonconvex optimization and efficient implementation techniques for quantum computers. Quantum Hamiltonian Descent is derived as the path integral of classical gradient descent algorithms. Due to the quantum interference of classical descent trajectories, Quantum Hamiltonian Descent exhibits drastically different behavior from classical gradient descent, especially for nonconvex problems. Under mild assumptions, we prove that Quantum Hamiltonian Descent can always find the global minimum of an unconstrained optimization problem given a sufficiently long runtime. Moreover, we demonstrate that Quantum Hamiltonian Descent can efficiently solve a family of nonconvex optimization problems with exponentially many local minima, which most commonly used classical optimizers require super-polynomial time to solve. Additionally, by using Quantum Hamiltonian Descent as an algorithmic primitive, we show a quadratic oracular separation between quantum and classical computing. We consider the implementation of Quantum Hamiltonian Descent for two important paradigms of quantum computing, namely digital (fault-tolerant) and analog quantum computers. Exploiting the product formula for quantum Hamiltonian simulation, we demonstrate that a digital quantum computer can implement Quantum Hamiltonian Descent with gate complexity nearly linear in problem dimension and evolution time. With a hardware-efficient sparse Hamiltonian simulation technique known as Hamiltonian embedding, we develop an analog implementation recipe for Quantum Hamiltonian Descent that addresses a broad class of nonlinear optimization problems, including nonconvex quadratic programming. This analog implementation approach is deployed on large-scale quantum spin-glass simulators, and the empirical results strongly suggest that Quantum Hamiltonian Descent has great potential for highly nonconvex and nonlinear optimization tasks.
  • Item
    Towards Immersive Streaming for Videos and Light Fields
    (2024) Li, David; Varshney, Amitabh; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    As virtual and augmented reality devices evolve with new applications, the ability to create and transmit immersive content becomes ever more critical. In particular, mobile, standalone devices have power, computing, and bandwidth limitations which require careful thought on how to deliver content to users. In this dissertation, we examine techniques to enable adaptive streaming of two types of content: 360◦ panoramic videos and light fields. With the rapidly increasing resolutions of 360◦ cameras, head-mounted displays, and live-streaming services, streaming high-resolution panoramic videos over limited-bandwidth networks is becoming a critical challenge. Foveated video streaming can address this rising challenge in the context of eye-tracking-equipped virtual reality head-mounted displays. We introduce a new log-rectilinear transformation incorporating summed-area table filtering and off-the-shelf video codecs to enable foveated streaming of 360◦ videos suitable for VR headsets with built-in eye-tracking. Our technique results in a 31% decrease in flickering and a 10% decrease in bit rate with H.264 streaming while maintaining similar or better quality. Neural representations have shown great promise in compactly representing radiance and light fields. However, existing neural representations are not suited for streaming as decoding can only be done at a single level of detail and requires downloading the entire neural network model. To resolve these challenges, we present a progressive multi-scale light field network that encodes light fields with multiple levels of detail across various subsets of the network weights. With our approach, light field networks can render starting with less than 7% of the model weights and progressively depict greater levels of detail as more model weights are streamed. Existing methods for levels of detail in neural representations focus on a few discrete levels of detail. While a few discrete LODs are enough to enable progressive streaming and reduce artifacts, transitioning between LODs becomes a challenge as an instant transition can result in a popping artifact, blending requires two render passes at adjacent LODs, and dithering can briefly appear as flickering. Additionally, models with a few LODs create large model deltas and can only coarsely adapt to bandwidth and compute resources. To address these limitations, we present continuous levels of detail for light field networks to address flickering artifacts during transitions across levels of detail and enable more granular adaptation to available resources. With our approach, we reduce flickering between successive model updates by approximately 40 − 80% and go from 4 performance levels to 385 performance levels from which the model can be executed. By rendering levels of detail at each possible network width, we additionally reduce the model size deltas from over a hundred rows and columns per layer down to a single row and column per layer, for smoother streaming potential.
  • Item
    CUR Matrix Approximation Through Convex Optimization
    (2024) Linehan, Kathryn; Balan, Radu V; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In this dissertation we present work on the CUR matrix approximation. Specifically, we present 1) an approximation of the proximal operator of the L-infinity norm using a neural network, 2) a novel deterministic CUR formulation and algorithm, and 3) a novel application of CUR as a feature selection method to determine discriminant proteins when clustering protein expression data in a self-organizing map (SOM). The proximal operator of the L-infinity norm arises in our CUR algorithm. Since the computation of the proximal operator of the L-infinity norm requires a sort of the input data (or at least a partial sort similar to quicksort), we present a neural network to approximate the proximal operator. A novel aspect of the network is that it is able to accept vectors of varying lengths due to a feature selection process that uses moments of the input data. We present results on the accuracy of the approximation, feature importance, and computational efficiency of the approach, and present an algorithm to calculate the proximal operator of the L-infinity norm exactly, relate it to the Moreau decomposition, and compare its computational efficiency to that of the approximation. Next, we present a novel deterministic CUR formulation that uses convex optimization to form the matrices C and R, and a corresponding algorithm that uses bisection to ensure that the user selected number of columns appear in C and the user selected number of rows appear in R. We implement the algorithm using the surrogate functional technique of Daubechies et al. [Communications on Pure and Applied Mathematics, 57.11 (2004)] and extend the theory of this approach to apply to our CUR formulation. Numerical results are presented that demonstrate the effectiveness of our CUR algorithm as compared to the singular value decomposition (SVD) and other CUR algorithms. Last, we use our CUR approximation as a feature selection method in the application by Higuera et al. [PLOS ONE, 10(6) (2015)] to determine discriminant proteins when clustering protein expression data in an SOM. This is a novel application of CUR and to the best of our knowledge, this is the first use of CUR on protein expression data. We compare the performance of our CUR algorithm to other CUR algorithms and the Wilcoxon rank-sum test (the original feature selection method in the work).
  • Item
    Towards Effective and Inclusive AI: Aligning AI Systems with User Needs and Stakeholder Values Across Diverse Contexts
    (2024) Cao, Yang; Daumé III, Hal; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Inspired by the Turing test, a long line of research in AI has focused on technical improvement on tasks thought to require human-like comprehension. However, this focus has often resulted in models with impressive technical capabilities but uncertain real-world applicability. Despite the advancements of large pre-trained models, we still see various failure cases towards discriminated groups and when applied to specific applications. A major problem here is the detached model development process — these models are designed, developed, and evaluated with limited consideration of their users and stakeholders. My dissertation is dedicated to addressing this detachment by examining how artificial intelligence (AI) systems can be more effectively aligned with the needs of users and the values of stakeholders across diverse contexts. This workaims to close the gap between the current state of AI technology and its meaningful application in the lives of real-life stakeholders. My thesis explores three key aspects of aligning AI systems with human needs and values: identifying sources of misalignment, addressing the needs of specific user groups, and ensuring value alignment across diverse stakeholders. First, I examine potential causes of misalignment in AI system development, focusing on gender biases in natural language processing (NLP) systems. I demonstrate that without careful consideration of real-life stakeholders, AI systems are prone to biases entering at each development stage. Second, I explore the alignment of AI systems for specific user groups by analyzing two real-life application contexts: a content moderation assistance system for volunteer moderators and a visual question answering (VQA) system for blind and visually impaired (BVI) individuals. In both contexts, I identify significant gaps in AI systems and provide directions for better alignment with users’ needs. Finally, I assess the alignment of AI systems with human values, focusing on stereotype issues within general large language models (LLMs). I propose a theory-grounded method for systematically evaluating stereotypical associations and exploring their impact on diverse user identities, including intersectional identity stereotypes and the leakage of stereotypes across cultures. Through these investigations, this dissertation contributes to the growing field of human-centered AI by providing insights, methodologies, and recommendations for aligning AI systems with the needs and values of diverse stakeholders. By addressing the challenges of misalignment, user-specific needs, and value alignment, this work aims to foster the development of AI technologies that effectively collaborate with and empower users while promoting fairness, inclusivity, and positive social impact.
  • Item
    Advancing the State of Auto-tuning with Programming Languages
    (2024) Chen, Ray Sun; Hollingsworth, Jeffrey K; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In the realm of computer science, auto-tuning refers to techniques for software performance optimization. The focus of traditional auto-tuning research is to identify novel performance parameters to expand the optimization space for a given target software/platform combination, and improve the automated search within this optimization space. This makes high-performance computing (HPC) a prime candidate for auto-tuning research, as it sits at the nexus of architectural diversity and performance criticality. However, the major successes for HPC auto-tuning to date involve tailoring memory access patterns to specific cache hierarchies. While important, this is just a small piece of the overall performance portability puzzle. I argue that auto-tuning has room to expand and optimize a richer set of HPC application tuning parameters through the combination of novel non-intrusive programming language idioms and advanced lightweight online search techniques. I support my argument through four contributions to the field. This dissertation describes two techniques for expanding auto-tuning optimization spaces, and two techniques for distributing the auto-tuning search for parallel efficiency.
  • Item
    THEORETICAL AND PRACTICAL HIGH-ASSURANCE SOFTWARE TOOLS FOR QUANTUM APPLICATIONS
    (2024) Peng, Yuxiang; Wu, Xiaodi; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Quantum computing promises to revolutionize solutions to valuable computational problems like factorization and quantum system simulation. Harnessing the power of quantum computers in real life cannot happen without a supportive software stack. This thesis studies the fundamental problems appearing in the software for quantum computing to shed light on shaping software that controls quantum computing devices in the near term and long future. First, we consider the software for Hamiltonian-oriented quantum computing (HOQC). We discuss our new software framework, SimuQ, for programming quantum Hamiltonian simulation on analog quantum simulators (AQS). To characterize the capability of AQSs, we introduce new abstractions named abstract analog instruction sets (AAIS), which enable us to design several domain-specific languages and a novel compiler. SimuQ reduces knowledge barriers for the study of HOQC because front-end users can easily program Hamiltonian systems and simulate their dynamics on AQSs. We then demonstrate how SimuQ inspires new algorithm design and hardware improvements, and extend our view to the full-stack software support of quantum applications via HOQC. Second, we study the software for circuit-oriented quantum computing (COQC). Many prototypes of quantum programming languages have been proposed in recent years based on quantum circuits, while how to debug programs written in these languages remains a critical challenge. Formal methods, a mature area in programming language research, provide a new perspective in guaranteeing the correctness of quantum programs, and we adapt several formal verification techniques to accommodate the nature of quantum programs. In compilers for quantum while programs, rewrite rules need to preserve the semantics of the programs. We connect non-idempotent Kleene algebra with quantum program semantics and propose an equivalence-proving system with algebraic expression derivation. Besides, deductive program verification can also be leveraged on quantum programs to ensure the semantics correctness. To this end, we formally certify an end-to-end implementation of Shor's algorithm, justifying the feasibility of deductive verification for quantum programs.
  • Item
    EFFICIENT COMPUTATIONAL ALGORITHMS FOR MAGNETIC EQUILIBRIUM IN A FUSION REACTOR
    (2024) Liang, Jiaxing; Elman, Howard C.; Sanchez-Vizuet, Tonatiuh; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In a magnetic confinement fusion reactor, like a Tokamak, hydrogen isotopes are injected into the chamber and heated to form a plasma. The hot plasma tends to expand, so it's crucial to confine it within the center of the reactor to prevent contact with the reactor walls. This confinement is achieved through magnetic fields generated by currents running through external coils surrounding the reactor. However, these currents may suffer from uncertainties, which could arise from factors like temperature fluctuations and material impurities. These variables introduce a level of unpredictability in the plasma's behavior and the overall stability of the fusion process. This thesis aims to investigate the impact that stochasticity in the current intensities has on the confinement properties of the plasma and to estimate the expected behavior of the magnetic field. While the focus is on the variability in current intensities, the tools developed can be applied to other sources of uncertainty, such as the positioning of coils and the source term parameters. To quantify the variability in model predictions and to evaluate the statistical properties of solutions over a range of parameter values, traditional sampling methods like Monte Carlo, often require intensive and expensive nonlinear computations. To tackle this challenge, we propose three approaches. Firstly, we focus on the development and application of a surrogate function, constructed via a stochastic collocation approach on a sparse grid in the parameter space. This surrogate function is employed to replace the nonlinear solution in Monte Carlo sampling processes. For our numerical experiments, we evaluate the efficiency and accuracy of the outcomes produced by the surrogate, in comparison with those obtained through direct nonlinear solutions. Our findings indicate that a carefully selected surrogate function reduces the sampling cost -- achieving acceleration factors ranging from 7 to over 30 -- while preserving the accuracy of the results. The second part of the thesis explores the multilevel Monte Carlo approach, investigating its potential for cost savings compared to simple Monte Carlo. This method involves conducting the majority of computations on a sequence of coarser spatial grids compared to what a simple Monte Carlo simulation would typically use. We examine this approach with non-linear computation, using both uniformly refined meshes and adaptively refined grids guided by a discrete error estimator. Numerical experiments reveal substantial cost reductions achieved through multilevel methods, typically ranging from a factor of 60 to exceeding 200. Adaptive gridding results in more accurate computation of relevant geometric parameters. In the last part of this work, we explore hybridmethods that integrate surrogates with multilevel Monte Carlo to further reduce the sampling cost. We establish the optimal construction and sampling costs for the surrogate-based multilevel Monte Carlo. Numerical results demonstrate that surrogate-based multilevel Monte Carlo remarkably reduces the computational burden, requiring only 0.1 to 14 seconds for a target relative mean square error ranging from $8\times 10^{-3}$ to $2\times10^{-4}$, reducing the cost of direct computation by factors of 50 to 300. In terms of accuracy, the surrogate-based sampling results exhibit close congruence with those obtained via direct computation, both in plasma boundary and geometric descriptors. The primary contributions of our work entail the application of stochastic collocation techniques and multilevel Monte Carlo methods to analyze plasma behavior under uncertainties in current within fusion reactors. Furthermore, we establish the universal sampling cost for the surrogate-enhanced multilevel Monte Carlo approach. Our methodology presents a paradigm in how we approach and manage computational challenges in this field.
  • Item
    FOUNDATIONS OF TRUSTWORTHY DEEP LEARNING: FAIRNESS, ROBUSTNESS, AND EXPLAINABILITY
    (2024) Nanda, Vedant; Dickerson, John; Gummadi, Krishna; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Deep Learning (DL) models, especially with the rise of the so-called foundation models, are increasingly used in real-world applications either as autonomous systems (\eg~facial recognition), as decision aids (\eg~medical imaging, writing assistants), and even to generate novel content (\eg~chatbots, image generators). This naturally results in concerns about the trustworthiness of these systems, for example, do the models systematically perform worse for certain subgroups? Are the outputs of these models reliable under perturbations to the inputs? This thesis aims to strengthen the foundations of DL models, so they can be trusted in deployment. I will cover three important aspects of trust: fairness, robustness, and explainability. I will argue that we need to expand the scope of each of these aspects when applying them to DL models and carefully consider possible tradeoffs between these desirable but sometimes conflicting notions of trust. Traditionally the fairness community has worked on mitigating biases in classical models such as Support Vector Machines (SVMs) and logistic regression. However, a lot of real-world applications where bias shows up in a myriad of ways involve much more complicated DL models. In the first part, I will present two works that show how thinking about fairness for deep learning (DL) introduces new challenges, especially due to their overparametrized nature and susceptibility to adversarial attacks. Robustness literature has focused largely on measuring the invariance of models to carefully constructed (adversarial attacks) or natural (distribution shifts) noise. In the second part, I will argue that to get truly robust models, we must focus on a more general notion of robustness: measuring the alignment of invariances of DL models with other models of perception such as humans. I will present two works that measure shared invariances between (1) DL models and humans, and (2) between DL models. Such measurements of robustness provide a measure of \textit{relative robustness}, through which we can better understand the failure modes of DL models and work towards building truly robust systems. Finally, in the third part, I will show how even a small subset of randomly chosen neurons from a pre-trained representation can transfer very well to downstream tasks. We call this phenomenon \textit{diffused redundancy}, which we observe in a variety of pre-trained representations. This finding challenges existing beliefs in the explainability literature that claim individual neurons learn disjoint semantically meaningful concepts.
  • Item
    Structured discovery in graphs: Recommender systems and temporal graph analysis
    (2024) Peyman, Sheyda Do'a; Lyzinski, Vince V.; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Graph-valued data arises in numerous diverse scientific fields ranging from sociology, epidemiology and genomics to neuroscience and economics.For example, sociologists have used graphs to examine the roles of user attributes (gender, class, year) at American colleges and universities through the study of Facebook friendship networks and have studied segregation and homophily in social networks; epidemiologists have recently modeled Human-nCov protein-protein interactions via graphs, and neuroscientists have used graphs to model neuronal connectomes. The structure of graphs, including latent features, relationships between the vertex and importance of each vertex are all highly important graph properties that are main aspects of graph analysis/inference. While it is common to imbue nodes and/or edges with implicitly observed numeric or qualitative features, in this work we will consider latent network features that must be estimated from the network topology.The main focus of this text is to find ways of extracting the latent structure in the presence of network anomalies. These anomalies occur in different scenarios: including cases when the graph is subject to an adversarial attack and the anomaly is inhibiting inference, and in the scenario when detecting the anomaly is the key inference task. The former case is explored in the context of vertex nomination information retrieval, where we consider both analytic methods for countering the adversarial noise and also the addition of a user-in-the-loop in the retrieval algorithm to counter potential adversarial noise. In the latter case we use graph embedding methods to discover sequential anomalies in network time series.
  • Item
    ML for Anime: Illustration, Animation, and 3D Characters
    (2024) Chen, Shuhong; Zwicker, Matthias; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    As anime-style content becomes more popular on the global stage, we ask whether new vision/graphics techniques could contribute to the art form. However, the highly-expressive and non-photorealistic nature of anime poses additional challenges not addressed by standard ML models, and much of the existing work in the domain does not align with real artist workflows. In this dissertation defense, we will present several works building foundational 2D/3D infrastructure for ML in anime (including pose estimation, video frame interpolation, and 3D character reconstruction) as well as an interactive tool leveraging novel techniques to assist 2D animators.
  • Item
    Repetitive Patterns in Digital Images: Raster Analysis and Vector Synthesis
    (2024) Tu, Peihan; Zwicker, Matthias; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Repetitive patterns are ubiquitous in various fields, including textile design, digital art, web design, and graphic design, offering significant practical and creative benefits. This dissertation delves into the automated analysis and synthesis of these patterns within digital imagery. We focus on the replication of geometric structures in vector patterns and geometric analysis of raster patterns, highlighting the unique challenges and methodologies involved in each. Creating repetitive vector patterns, characterized by their intricate shapes, is notably challenging and laborious. This intricate process not only demands significant time invest- ment but also requires a unique combination of creative insight and meticulous precision. Although computational methods have streamlined some aspects of manual creation, they predominantly cater to simple elements, leaving complex patterns less explored. To overcome this, we introduce a computational approach for synthesizing continuous structures in vector patterns from exemplars. This approach innovates on existing sample-based discrete element synthesis methods to consider not only sample positions (geometry) but also theirconnections (topology). Additionally, we present an example-based method to synthesize more general patterns with diverse shapes and structured local interactions, incorporating explicit clustering as part of neighborhood similarity and iterative sample optimization for more robust sample synthesis and pattern reconstruction. Conversely, raster textures, essential visual elements in both real images and computer-generated imagery, present distinct challenges in editing due to their pixel-based nature. Traditional texture editing, often a repetitive and tedious task, requires manual adjustments of textons—small, recurring patterns that define textures. Addressing this, we propose a novel, fully unsupervised method using a compositional neural model to represent textures. Each texton is modeled as a 2D Gaussian function, capturing its shape and detailed appearance. This discrete composition of Gaussian textons simplifies editing and enables the efficient synthesis of new textures through a generator network. Our method facilitates a broad spectrum of applications, from texture transfer and diversification to animation and direct manipulation, significantly advancing texture analysis, modeling, and editing techniques. This approach not only enhances the capability for creating visually appealingimages with controllable textures but also opens up new avenues for exploration in digital imagery.
  • Item
    Efficient Models and Learning Strategies for Resource-Constrained Systems
    (2024) Rabbani, Tahseen Wahed Karim; Huang, Furong; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The last decade has seen sharp improvements in the performance of machine learning (ML) models but at the cost of vastly increased complexity and size of their underlying architectures. Advances in high-performance computing have enabled researchers to train and deploy models composed of hundreds of billions of parameters. However, harvesting the full utility of large models on smaller clients, such as Internet of Things (IoT) devices, without resorting to external hosting will require a significant reduction of parameters and faster, cheaper inference. In addition to augmenting IoT, efficient models and learning paradigms can reduce energy consumption, encourage technological equity, and are well-suited for deployment in real-world applications that require fast response in low-resource settings. To address these challenges, we introduce multiple, novel strategies for (1) reducing the scale of deep neural networks and (2) faster learning. For the size problem (1), we leverage tools such as tensorization, randomized projections, and locality-sensitive hashing to train on reduced representations of large models without sacrificing performance. For learning efficiency (2), we develop algorithms for cheaper forward passes, accelerated PCA, and asynchronous gradient descent. Several of these methods are tailored for federated learning (FL), a private, distributed learning paradigm where data is decentralized among resource-constrained edge clients. We are exclusively concerned with improving efficiency during training -- our techniques do not process pre-trained models or require a device to train over an architecture in its full entirety.
  • Item
    Towards Generalized and Scalable Machine Learning on Structured Data
    (2024) Kong, Kezhi; Goldstein, Tom; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Deep Learning and Neural Networks have brought a transformative era for the field of machine learning, significantly influencing how we approach and utilize structured data. This dissertation is dedicated to exploring machine learning methodologies specifically designed for structured graphs and tables, aiming to enhance the performance of neural networks on the important data modalities.Graph Neural Networks (GNNs) have emerged as powerful architectures for learning and analyzing graph representations. However, the training of GNNs on large-scale datasets usually suffers from overfitting, posing significant generalization challenges for prediction problems. Meanwhile, conventional GNNs are hindered by scalability problem when deployed on industrial- level graph datasets. Moreover for the table reasoning task, Large Language Models (LLMs) have shown competitive ability, but cannot fully process large tables due to context limit and may fail to comprehend the complex relationships within tabular data. In this dissertation, we investigate algorithms and techniques to address the generalization and scalability issues of GNNs, as well as the effective and efficient approach to the table reasoning task. In the first work, we propose to leverage data augmentation to generalize GNNs. We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training. In the second and third work, we look into GNNs’ scalability problem. We propose VQ-GNN, a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. We further propose GOAT, a global graph transformer that scales to large graphs with millions of nodes and is competitive on tasks of both homophilious and heterophilious graphs. Lastly, we propose OpenTab, an effective method towards open-domain table reasoning task built with the advanced Large Language Models.