Computer Science Theses and Dissertations

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 996
  • Item
    Interpretability of Deep Models Across different Architectures and Modalities
    (2024) Kazemitabaiezavare, Seyedhamidreza; Goldstein, Tom; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The exploration of deep models has been a persistent focus in research over time. Model inversion and feature visualization stand out as significant methods for deciphering the workings of deep models. They play a vital role in unraveling the inner workings of neural architectures, understanding their acquired knowledge, and elucidating their behaviors. This dissertation comprises three chapters dedicated to investigating deep models, particularly newly emerging ones such as Vision Transformers (ViTs) and CLIP, using model inversion and feature visualization techniques. In the first chapter, introducing Plugin Inversion, we introduce a model inversion method that, instead of relying on regularizers, relies on a set of augmentations, sidestepping the need for extensive hyperparameter tuning. We demonstrate the efficacy of our approach by applying it to invert Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and Multi-Layer Perceptrons (MLPs). The contributions of this work are summarized as follows: (I) We provide a detailed analysis of various augmentations and how they affect the quality of images produced via class inversion. (II) We introduce Plug-In Inversion (PII), a new class inversion technique based on these augmentations, and compare it to existing techniques. (III) We apply PII to dozens of pre-trained models of varying architecture, justifying our claim that it can be ‘plugged in’ to most networks without modification. (IV) In particular, we show that PII succeeds in inverting ViTs and large MLP-based architectures, which, to our knowledge, has not previously been accomplished. (V) Finally, we explore the potential for combining PII with prior methods. In the second chapter, following PII and the techniques introduced, we utilize model inversion to examine CLIP models. Unlike traditional classification models that are restricted to a predefined set of classes, CLIP models are free of that restriction, and we can apply model inversion to any choice of prompts. Our contributions in this chapter are as follows: (I) In recent years, generative models have shown the capability to blend concepts. We demonstrate that the same holds true for CLIP models, and the knowledge embedded inside CLIP models can blend concepts. (II) We demonstrate that through inversion, seemingly harmless prompts, such as celebrity names, can produce NSFW images. This is particularly true for women celebrities, who the CLIP model seems to strongly associate with sexual content. Certain identities, like “Dakota Johnson,” are close to many NSFW words in the embedding space. This may be problematic since the embeddings of CLIP models are being used in many text-to-image generative models. Addressing this issue requires more meticulous data curation during the training of large-scale models. (III) We demonstrate that CLIP models display gender bias in their knowledge through inversions applied to prompts related to professions and status. (IV) We investigate the scale of the training data on the quality of the inversions, and we show that more training data leads to better inversions. In chapter three, we delve into interpreting Vision Transformers using feature visualization. While feature visualizations have provided valuable insights into the workings of Convolutional Neural Networks (CNNs), these methods have struggled to interpret ViT representations due to their inherent complexity. Nevertheless, we demonstrate that with proper application to the appropriate representations, feature visualizations can be successful with ViTs. This newfound understanding enables us to delve visually into ViTs and the information they extract from images. The contributions of this chapter are as follows: (I) We observe that uninterpretable and adversarial behavior occurs when applying standard methods of feature visualization to the relatively low-dimensional components of transformer-based models, such as keys, queries, or values. However, applying these tools to the relatively high-dimensional features of the position-wise feedforward layer results in successful and informative visualizations. We conduct large-scale visualizations on a wide range of transformer-based vision models, including ViTs, DeiT, CoaT, ConViT, PiT, Swin, and Twin, to validate the effectiveness of our method. (II) We show that patch-wise image activation patterns for ViT features essentially behave like saliency maps, highlighting the regions of the image a given feature attends to. This behavior persists even for relatively deep layers, showing the model preserves the positional relationship between patches instead of using them as global information stores. (III) We compare the behavior of ViTs and CNNs, finding that ViTs make better use of background information and rely less on high-frequency, textural attributes. Both types of networks build progressively more complex representations in deeper layers and eventually contain features responsible for detecting distinct objects. (IV) We investigate the effect of natural language supervision with CLIP on the types of features extracted by ViTs. We find CLIP-trained models include various features clearly catered to detecting components of images corresponding to caption text, such as prepositions, adjectives, and conceptual categories.
  • Item
    MATHEMATICS OF THE DYNAMICS AND CONTROL OF THE SARS-COV-2 PANDEMIC
    (2024) Pant, Binod; Gumel, Abba B.; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The pneumonia-like illness that emerged late in 2019, caused by SARS-CoV-2 (and coined COVID-19), became the greatest public health challenge humans have faced since the 1918/1919 influenza pandemic, causing over 670 million confirmed cases and 7 million fatalities globally. This dissertation contributes in providing deep qualitative insights and understanding on the transmission dynamics and control of the pandemic, using mathematical modeling approaches together with data analytics and computation. Specifically, it addresses some of the pertinent challenges associated with modeling the dynamics of the disease, notably the disproportionate effect of the disease on certain (risk and demographic) populations (inducing various heterogeneities) and behavior changes with respect to adherence or lack thereof to interventions. An $m-$group model, which monitors the temporal dynamics of the disease in m heterogeneous populations, was designed and used to study the impact of age heterogeneity and vaccination on the spread of the disease in the United States. For instance, the disease-free equilibrium for the case of the model with m=1 (i.e., the model with a homogeneous population) was shown to be globally-asymptotically stable for two special cases (when vaccine is perfect or when disease-induced mortality is negligible) whenever the associated reproduction number of the homogeneous model is less than one. The homogeneous model has a unique endemic equilibrium whenever the reproduction threshold exceeds unity (this equilibrium was shown to be globally-asymptotically stable for a special case, using a nonlinear Lyapunov function of Goh-Volterra type). The homogeneous model was fitted to the observed cumulative mortality data for the SARS-CoV-2 pandemic in the United States during the period from January to May of 2022 (when Omicron was the predominant variant). It was shown that vaccine-derived herd immunity (needed to eliminate the disease) cannot be attained using the homogeneous model regardless of the proportion of individuals fully vaccinated. Such vaccine-derived immunity can, however, be achieved using the $m$-group heterogeneous model, with $m=2$ (where the total population is split into two groups: those under 65 years of age, and those 65 years and older), if at least 61\% of the susceptible population is fully vaccinated. Thus, this dissertation shows that heterogeneity reduces the level of vaccine coverage needed to eliminate the pandemic (and models that do not account for heterogeneity may be over-estimating the vaccination coverage needed to achieve herd immunity in the community). To quantify the impact of human behavior changes on the spread and control of the pandemic, we designed a novel behavior-epidemiology model which considers numerous metrics for inducing human behavior changes (such as current level of disease burden and intervention adherence fatigue). Unlike the equivalent model without human behavior explicitly incorporated, the behavior-epidemiology model fits the observed cumulative mortality and predicts the observed daily mortality data very well. It was also shown that the behavior metrics related to the level of SARS-CoV-2 mortality and symptomatic transmission were more influential in inducing positive behavior changes than all other behavior metrics considered. Finally, a model was developed to assess the utility of wastewater surveillance to study the transmission dynamics and control of SARS-CoV-2 in a community. Specifically, we developed and calibrated a wastewater-based epidemiology model using wastewater data from Miami-Dade county, Florida, during the third wave of the SARS-CoV-2 pandemic. The model showed a strong correlation between the observed (detected) weekly case data and the corresponding weekly data predicted by the calibrated model. The model's prediction of the week when maximum number of SARS-CoV-2 cases will be recorded in the county during the simulation period precisely matched the time when the maximum observed/reported cases were recorded (August 14, 2021). Furthermore, the model's projection of the maximum number of cases for the week of August 14, 2021 was about 15 times higher than the maximum observed weekly case count for the county on that day (i.e., the maximum case count estimated by the model was 15 times higher than the actual/observed count for confirmed cases). In addition to being in line with other modeling studies, this result is consistent with the CDC estimate that the reported confirmed case data may be 10 times lower than the actual (since the confirmed data did not account for asymptomatic and presymptomatic transmission). Furthermore, the model accurately predicted a one-week lag between the peak in weekly COVID-19 case and hospitalization data during the time period of the study in Miami-Dade, with the model-predicted hospitalizations peaking on August 21, 2021. Detailed time-varying global sensitivity analysis was carried out to determine the parameters (wastewater-based, epidemiological and biological) that have the most influence on the chosen response function (namely, the cumulative viral load in the wastewater). This analysis identified key parameters that significantly affect the value of the response function (hence, they should be targeted for intervention). This dissertation conclusively showed that wastewater surveillance data can be a very powerful indicator for measuring (i.e., providing early-warning signal and current burden) and predicting the future trajectory and burden (e.g., number of cases and hospitalizations) of emerging and re-emerging infectious diseases, such as SARS-CoV-2, in a community.
  • Item
    Quantum Algorithms for Nonconvex Optimization: Theory and Implementation
    (2024) Leng, Jiaqi; Wu, Xiaodi XW; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Continuous optimization problems arise in virtually all disciplines of quantitative research. While convex optimization has been well-studied in recent decades, large-scale nonconvex optimization problems remain intractable in both theory and practice. Quantum computers are expected to outperform classical computers in certain challenging computational problems. Some quantum algorithms for convex optimization have shown asymptotic speedups, while the quantum advantage for nonconvex optimization is yet to be fully understood. This thesis focuses on Quantum Hamiltonian Descent (QHD), a quantum algorithm for continuous optimization problems. A systematic study of Quantum Hamiltonian Descent is presented, including theoretical results concerning nonconvex optimization and efficient implementation techniques for quantum computers. Quantum Hamiltonian Descent is derived as the path integral of classical gradient descent algorithms. Due to the quantum interference of classical descent trajectories, Quantum Hamiltonian Descent exhibits drastically different behavior from classical gradient descent, especially for nonconvex problems. Under mild assumptions, we prove that Quantum Hamiltonian Descent can always find the global minimum of an unconstrained optimization problem given a sufficiently long runtime. Moreover, we demonstrate that Quantum Hamiltonian Descent can efficiently solve a family of nonconvex optimization problems with exponentially many local minima, which most commonly used classical optimizers require super-polynomial time to solve. Additionally, by using Quantum Hamiltonian Descent as an algorithmic primitive, we show a quadratic oracular separation between quantum and classical computing. We consider the implementation of Quantum Hamiltonian Descent for two important paradigms of quantum computing, namely digital (fault-tolerant) and analog quantum computers. Exploiting the product formula for quantum Hamiltonian simulation, we demonstrate that a digital quantum computer can implement Quantum Hamiltonian Descent with gate complexity nearly linear in problem dimension and evolution time. With a hardware-efficient sparse Hamiltonian simulation technique known as Hamiltonian embedding, we develop an analog implementation recipe for Quantum Hamiltonian Descent that addresses a broad class of nonlinear optimization problems, including nonconvex quadratic programming. This analog implementation approach is deployed on large-scale quantum spin-glass simulators, and the empirical results strongly suggest that Quantum Hamiltonian Descent has great potential for highly nonconvex and nonlinear optimization tasks.
  • Item
    Towards Immersive Streaming for Videos and Light Fields
    (2024) Li, David; Varshney, Amitabh; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    As virtual and augmented reality devices evolve with new applications, the ability to create and transmit immersive content becomes ever more critical. In particular, mobile, standalone devices have power, computing, and bandwidth limitations which require careful thought on how to deliver content to users. In this dissertation, we examine techniques to enable adaptive streaming of two types of content: 360◦ panoramic videos and light fields. With the rapidly increasing resolutions of 360◦ cameras, head-mounted displays, and live-streaming services, streaming high-resolution panoramic videos over limited-bandwidth networks is becoming a critical challenge. Foveated video streaming can address this rising challenge in the context of eye-tracking-equipped virtual reality head-mounted displays. We introduce a new log-rectilinear transformation incorporating summed-area table filtering and off-the-shelf video codecs to enable foveated streaming of 360◦ videos suitable for VR headsets with built-in eye-tracking. Our technique results in a 31% decrease in flickering and a 10% decrease in bit rate with H.264 streaming while maintaining similar or better quality. Neural representations have shown great promise in compactly representing radiance and light fields. However, existing neural representations are not suited for streaming as decoding can only be done at a single level of detail and requires downloading the entire neural network model. To resolve these challenges, we present a progressive multi-scale light field network that encodes light fields with multiple levels of detail across various subsets of the network weights. With our approach, light field networks can render starting with less than 7% of the model weights and progressively depict greater levels of detail as more model weights are streamed. Existing methods for levels of detail in neural representations focus on a few discrete levels of detail. While a few discrete LODs are enough to enable progressive streaming and reduce artifacts, transitioning between LODs becomes a challenge as an instant transition can result in a popping artifact, blending requires two render passes at adjacent LODs, and dithering can briefly appear as flickering. Additionally, models with a few LODs create large model deltas and can only coarsely adapt to bandwidth and compute resources. To address these limitations, we present continuous levels of detail for light field networks to address flickering artifacts during transitions across levels of detail and enable more granular adaptation to available resources. With our approach, we reduce flickering between successive model updates by approximately 40 − 80% and go from 4 performance levels to 385 performance levels from which the model can be executed. By rendering levels of detail at each possible network width, we additionally reduce the model size deltas from over a hundred rows and columns per layer down to a single row and column per layer, for smoother streaming potential.
  • Item
    CUR Matrix Approximation Through Convex Optimization
    (2024) Linehan, Kathryn; Balan, Radu V; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In this dissertation we present work on the CUR matrix approximation. Specifically, we present 1) an approximation of the proximal operator of the L-infinity norm using a neural network, 2) a novel deterministic CUR formulation and algorithm, and 3) a novel application of CUR as a feature selection method to determine discriminant proteins when clustering protein expression data in a self-organizing map (SOM). The proximal operator of the L-infinity norm arises in our CUR algorithm. Since the computation of the proximal operator of the L-infinity norm requires a sort of the input data (or at least a partial sort similar to quicksort), we present a neural network to approximate the proximal operator. A novel aspect of the network is that it is able to accept vectors of varying lengths due to a feature selection process that uses moments of the input data. We present results on the accuracy of the approximation, feature importance, and computational efficiency of the approach, and present an algorithm to calculate the proximal operator of the L-infinity norm exactly, relate it to the Moreau decomposition, and compare its computational efficiency to that of the approximation. Next, we present a novel deterministic CUR formulation that uses convex optimization to form the matrices C and R, and a corresponding algorithm that uses bisection to ensure that the user selected number of columns appear in C and the user selected number of rows appear in R. We implement the algorithm using the surrogate functional technique of Daubechies et al. [Communications on Pure and Applied Mathematics, 57.11 (2004)] and extend the theory of this approach to apply to our CUR formulation. Numerical results are presented that demonstrate the effectiveness of our CUR algorithm as compared to the singular value decomposition (SVD) and other CUR algorithms. Last, we use our CUR approximation as a feature selection method in the application by Higuera et al. [PLOS ONE, 10(6) (2015)] to determine discriminant proteins when clustering protein expression data in an SOM. This is a novel application of CUR and to the best of our knowledge, this is the first use of CUR on protein expression data. We compare the performance of our CUR algorithm to other CUR algorithms and the Wilcoxon rank-sum test (the original feature selection method in the work).
  • Item
    Towards Effective and Inclusive AI: Aligning AI Systems with User Needs and Stakeholder Values Across Diverse Contexts
    (2024) Cao, Yang; Daumé III, Hal; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Inspired by the Turing test, a long line of research in AI has focused on technical improvement on tasks thought to require human-like comprehension. However, this focus has often resulted in models with impressive technical capabilities but uncertain real-world applicability. Despite the advancements of large pre-trained models, we still see various failure cases towards discriminated groups and when applied to specific applications. A major problem here is the detached model development process — these models are designed, developed, and evaluated with limited consideration of their users and stakeholders. My dissertation is dedicated to addressing this detachment by examining how artificial intelligence (AI) systems can be more effectively aligned with the needs of users and the values of stakeholders across diverse contexts. This workaims to close the gap between the current state of AI technology and its meaningful application in the lives of real-life stakeholders. My thesis explores three key aspects of aligning AI systems with human needs and values: identifying sources of misalignment, addressing the needs of specific user groups, and ensuring value alignment across diverse stakeholders. First, I examine potential causes of misalignment in AI system development, focusing on gender biases in natural language processing (NLP) systems. I demonstrate that without careful consideration of real-life stakeholders, AI systems are prone to biases entering at each development stage. Second, I explore the alignment of AI systems for specific user groups by analyzing two real-life application contexts: a content moderation assistance system for volunteer moderators and a visual question answering (VQA) system for blind and visually impaired (BVI) individuals. In both contexts, I identify significant gaps in AI systems and provide directions for better alignment with users’ needs. Finally, I assess the alignment of AI systems with human values, focusing on stereotype issues within general large language models (LLMs). I propose a theory-grounded method for systematically evaluating stereotypical associations and exploring their impact on diverse user identities, including intersectional identity stereotypes and the leakage of stereotypes across cultures. Through these investigations, this dissertation contributes to the growing field of human-centered AI by providing insights, methodologies, and recommendations for aligning AI systems with the needs and values of diverse stakeholders. By addressing the challenges of misalignment, user-specific needs, and value alignment, this work aims to foster the development of AI technologies that effectively collaborate with and empower users while promoting fairness, inclusivity, and positive social impact.
  • Item
    Advancing the State of Auto-tuning with Programming Languages
    (2024) Chen, Ray Sun; Hollingsworth, Jeffrey K; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In the realm of computer science, auto-tuning refers to techniques for software performance optimization. The focus of traditional auto-tuning research is to identify novel performance parameters to expand the optimization space for a given target software/platform combination, and improve the automated search within this optimization space. This makes high-performance computing (HPC) a prime candidate for auto-tuning research, as it sits at the nexus of architectural diversity and performance criticality. However, the major successes for HPC auto-tuning to date involve tailoring memory access patterns to specific cache hierarchies. While important, this is just a small piece of the overall performance portability puzzle. I argue that auto-tuning has room to expand and optimize a richer set of HPC application tuning parameters through the combination of novel non-intrusive programming language idioms and advanced lightweight online search techniques. I support my argument through four contributions to the field. This dissertation describes two techniques for expanding auto-tuning optimization spaces, and two techniques for distributing the auto-tuning search for parallel efficiency.
  • Item
    THEORETICAL AND PRACTICAL HIGH-ASSURANCE SOFTWARE TOOLS FOR QUANTUM APPLICATIONS
    (2024) Peng, Yuxiang; Wu, Xiaodi; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Quantum computing promises to revolutionize solutions to valuable computational problems like factorization and quantum system simulation. Harnessing the power of quantum computers in real life cannot happen without a supportive software stack. This thesis studies the fundamental problems appearing in the software for quantum computing to shed light on shaping software that controls quantum computing devices in the near term and long future. First, we consider the software for Hamiltonian-oriented quantum computing (HOQC). We discuss our new software framework, SimuQ, for programming quantum Hamiltonian simulation on analog quantum simulators (AQS). To characterize the capability of AQSs, we introduce new abstractions named abstract analog instruction sets (AAIS), which enable us to design several domain-specific languages and a novel compiler. SimuQ reduces knowledge barriers for the study of HOQC because front-end users can easily program Hamiltonian systems and simulate their dynamics on AQSs. We then demonstrate how SimuQ inspires new algorithm design and hardware improvements, and extend our view to the full-stack software support of quantum applications via HOQC. Second, we study the software for circuit-oriented quantum computing (COQC). Many prototypes of quantum programming languages have been proposed in recent years based on quantum circuits, while how to debug programs written in these languages remains a critical challenge. Formal methods, a mature area in programming language research, provide a new perspective in guaranteeing the correctness of quantum programs, and we adapt several formal verification techniques to accommodate the nature of quantum programs. In compilers for quantum while programs, rewrite rules need to preserve the semantics of the programs. We connect non-idempotent Kleene algebra with quantum program semantics and propose an equivalence-proving system with algebraic expression derivation. Besides, deductive program verification can also be leveraged on quantum programs to ensure the semantics correctness. To this end, we formally certify an end-to-end implementation of Shor's algorithm, justifying the feasibility of deductive verification for quantum programs.
  • Item
    EFFICIENT COMPUTATIONAL ALGORITHMS FOR MAGNETIC EQUILIBRIUM IN A FUSION REACTOR
    (2024) Liang, Jiaxing; Elman, Howard C.; Sanchez-Vizuet, Tonatiuh; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In a magnetic confinement fusion reactor, like a Tokamak, hydrogen isotopes are injected into the chamber and heated to form a plasma. The hot plasma tends to expand, so it's crucial to confine it within the center of the reactor to prevent contact with the reactor walls. This confinement is achieved through magnetic fields generated by currents running through external coils surrounding the reactor. However, these currents may suffer from uncertainties, which could arise from factors like temperature fluctuations and material impurities. These variables introduce a level of unpredictability in the plasma's behavior and the overall stability of the fusion process. This thesis aims to investigate the impact that stochasticity in the current intensities has on the confinement properties of the plasma and to estimate the expected behavior of the magnetic field. While the focus is on the variability in current intensities, the tools developed can be applied to other sources of uncertainty, such as the positioning of coils and the source term parameters. To quantify the variability in model predictions and to evaluate the statistical properties of solutions over a range of parameter values, traditional sampling methods like Monte Carlo, often require intensive and expensive nonlinear computations. To tackle this challenge, we propose three approaches. Firstly, we focus on the development and application of a surrogate function, constructed via a stochastic collocation approach on a sparse grid in the parameter space. This surrogate function is employed to replace the nonlinear solution in Monte Carlo sampling processes. For our numerical experiments, we evaluate the efficiency and accuracy of the outcomes produced by the surrogate, in comparison with those obtained through direct nonlinear solutions. Our findings indicate that a carefully selected surrogate function reduces the sampling cost -- achieving acceleration factors ranging from 7 to over 30 -- while preserving the accuracy of the results. The second part of the thesis explores the multilevel Monte Carlo approach, investigating its potential for cost savings compared to simple Monte Carlo. This method involves conducting the majority of computations on a sequence of coarser spatial grids compared to what a simple Monte Carlo simulation would typically use. We examine this approach with non-linear computation, using both uniformly refined meshes and adaptively refined grids guided by a discrete error estimator. Numerical experiments reveal substantial cost reductions achieved through multilevel methods, typically ranging from a factor of 60 to exceeding 200. Adaptive gridding results in more accurate computation of relevant geometric parameters. In the last part of this work, we explore hybridmethods that integrate surrogates with multilevel Monte Carlo to further reduce the sampling cost. We establish the optimal construction and sampling costs for the surrogate-based multilevel Monte Carlo. Numerical results demonstrate that surrogate-based multilevel Monte Carlo remarkably reduces the computational burden, requiring only 0.1 to 14 seconds for a target relative mean square error ranging from $8\times 10^{-3}$ to $2\times10^{-4}$, reducing the cost of direct computation by factors of 50 to 300. In terms of accuracy, the surrogate-based sampling results exhibit close congruence with those obtained via direct computation, both in plasma boundary and geometric descriptors. The primary contributions of our work entail the application of stochastic collocation techniques and multilevel Monte Carlo methods to analyze plasma behavior under uncertainties in current within fusion reactors. Furthermore, we establish the universal sampling cost for the surrogate-enhanced multilevel Monte Carlo approach. Our methodology presents a paradigm in how we approach and manage computational challenges in this field.
  • Item
    FOUNDATIONS OF TRUSTWORTHY DEEP LEARNING: FAIRNESS, ROBUSTNESS, AND EXPLAINABILITY
    (2024) Nanda, Vedant; Dickerson, John; Gummadi, Krishna; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Deep Learning (DL) models, especially with the rise of the so-called foundation models, are increasingly used in real-world applications either as autonomous systems (\eg~facial recognition), as decision aids (\eg~medical imaging, writing assistants), and even to generate novel content (\eg~chatbots, image generators). This naturally results in concerns about the trustworthiness of these systems, for example, do the models systematically perform worse for certain subgroups? Are the outputs of these models reliable under perturbations to the inputs? This thesis aims to strengthen the foundations of DL models, so they can be trusted in deployment. I will cover three important aspects of trust: fairness, robustness, and explainability. I will argue that we need to expand the scope of each of these aspects when applying them to DL models and carefully consider possible tradeoffs between these desirable but sometimes conflicting notions of trust. Traditionally the fairness community has worked on mitigating biases in classical models such as Support Vector Machines (SVMs) and logistic regression. However, a lot of real-world applications where bias shows up in a myriad of ways involve much more complicated DL models. In the first part, I will present two works that show how thinking about fairness for deep learning (DL) introduces new challenges, especially due to their overparametrized nature and susceptibility to adversarial attacks. Robustness literature has focused largely on measuring the invariance of models to carefully constructed (adversarial attacks) or natural (distribution shifts) noise. In the second part, I will argue that to get truly robust models, we must focus on a more general notion of robustness: measuring the alignment of invariances of DL models with other models of perception such as humans. I will present two works that measure shared invariances between (1) DL models and humans, and (2) between DL models. Such measurements of robustness provide a measure of \textit{relative robustness}, through which we can better understand the failure modes of DL models and work towards building truly robust systems. Finally, in the third part, I will show how even a small subset of randomly chosen neurons from a pre-trained representation can transfer very well to downstream tasks. We call this phenomenon \textit{diffused redundancy}, which we observe in a variety of pre-trained representations. This finding challenges existing beliefs in the explainability literature that claim individual neurons learn disjoint semantically meaningful concepts.
  • Item
    Structured discovery in graphs: Recommender systems and temporal graph analysis
    (2024) Peyman, Sheyda Do'a; Lyzinski, Vince V.; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Graph-valued data arises in numerous diverse scientific fields ranging from sociology, epidemiology and genomics to neuroscience and economics.For example, sociologists have used graphs to examine the roles of user attributes (gender, class, year) at American colleges and universities through the study of Facebook friendship networks and have studied segregation and homophily in social networks; epidemiologists have recently modeled Human-nCov protein-protein interactions via graphs, and neuroscientists have used graphs to model neuronal connectomes. The structure of graphs, including latent features, relationships between the vertex and importance of each vertex are all highly important graph properties that are main aspects of graph analysis/inference. While it is common to imbue nodes and/or edges with implicitly observed numeric or qualitative features, in this work we will consider latent network features that must be estimated from the network topology.The main focus of this text is to find ways of extracting the latent structure in the presence of network anomalies. These anomalies occur in different scenarios: including cases when the graph is subject to an adversarial attack and the anomaly is inhibiting inference, and in the scenario when detecting the anomaly is the key inference task. The former case is explored in the context of vertex nomination information retrieval, where we consider both analytic methods for countering the adversarial noise and also the addition of a user-in-the-loop in the retrieval algorithm to counter potential adversarial noise. In the latter case we use graph embedding methods to discover sequential anomalies in network time series.
  • Item
    ML for Anime: Illustration, Animation, and 3D Characters
    (2024) Chen, Shuhong; Zwicker, Matthias; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    As anime-style content becomes more popular on the global stage, we ask whether new vision/graphics techniques could contribute to the art form. However, the highly-expressive and non-photorealistic nature of anime poses additional challenges not addressed by standard ML models, and much of the existing work in the domain does not align with real artist workflows. In this dissertation defense, we will present several works building foundational 2D/3D infrastructure for ML in anime (including pose estimation, video frame interpolation, and 3D character reconstruction) as well as an interactive tool leveraging novel techniques to assist 2D animators.
  • Item
    Repetitive Patterns in Digital Images: Raster Analysis and Vector Synthesis
    (2024) Tu, Peihan; Zwicker, Matthias; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Repetitive patterns are ubiquitous in various fields, including textile design, digital art, web design, and graphic design, offering significant practical and creative benefits. This dissertation delves into the automated analysis and synthesis of these patterns within digital imagery. We focus on the replication of geometric structures in vector patterns and geometric analysis of raster patterns, highlighting the unique challenges and methodologies involved in each. Creating repetitive vector patterns, characterized by their intricate shapes, is notably challenging and laborious. This intricate process not only demands significant time invest- ment but also requires a unique combination of creative insight and meticulous precision. Although computational methods have streamlined some aspects of manual creation, they predominantly cater to simple elements, leaving complex patterns less explored. To overcome this, we introduce a computational approach for synthesizing continuous structures in vector patterns from exemplars. This approach innovates on existing sample-based discrete element synthesis methods to consider not only sample positions (geometry) but also theirconnections (topology). Additionally, we present an example-based method to synthesize more general patterns with diverse shapes and structured local interactions, incorporating explicit clustering as part of neighborhood similarity and iterative sample optimization for more robust sample synthesis and pattern reconstruction. Conversely, raster textures, essential visual elements in both real images and computer-generated imagery, present distinct challenges in editing due to their pixel-based nature. Traditional texture editing, often a repetitive and tedious task, requires manual adjustments of textons—small, recurring patterns that define textures. Addressing this, we propose a novel, fully unsupervised method using a compositional neural model to represent textures. Each texton is modeled as a 2D Gaussian function, capturing its shape and detailed appearance. This discrete composition of Gaussian textons simplifies editing and enables the efficient synthesis of new textures through a generator network. Our method facilitates a broad spectrum of applications, from texture transfer and diversification to animation and direct manipulation, significantly advancing texture analysis, modeling, and editing techniques. This approach not only enhances the capability for creating visually appealingimages with controllable textures but also opens up new avenues for exploration in digital imagery.
  • Item
    Efficient Models and Learning Strategies for Resource-Constrained Systems
    (2024) Rabbani, Tahseen Wahed Karim; Huang, Furong; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The last decade has seen sharp improvements in the performance of machine learning (ML) models but at the cost of vastly increased complexity and size of their underlying architectures. Advances in high-performance computing have enabled researchers to train and deploy models composed of hundreds of billions of parameters. However, harvesting the full utility of large models on smaller clients, such as Internet of Things (IoT) devices, without resorting to external hosting will require a significant reduction of parameters and faster, cheaper inference. In addition to augmenting IoT, efficient models and learning paradigms can reduce energy consumption, encourage technological equity, and are well-suited for deployment in real-world applications that require fast response in low-resource settings. To address these challenges, we introduce multiple, novel strategies for (1) reducing the scale of deep neural networks and (2) faster learning. For the size problem (1), we leverage tools such as tensorization, randomized projections, and locality-sensitive hashing to train on reduced representations of large models without sacrificing performance. For learning efficiency (2), we develop algorithms for cheaper forward passes, accelerated PCA, and asynchronous gradient descent. Several of these methods are tailored for federated learning (FL), a private, distributed learning paradigm where data is decentralized among resource-constrained edge clients. We are exclusively concerned with improving efficiency during training -- our techniques do not process pre-trained models or require a device to train over an architecture in its full entirety.
  • Item
    Towards Generalized and Scalable Machine Learning on Structured Data
    (2024) Kong, Kezhi; Goldstein, Tom; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Deep Learning and Neural Networks have brought a transformative era for the field of machine learning, significantly influencing how we approach and utilize structured data. This dissertation is dedicated to exploring machine learning methodologies specifically designed for structured graphs and tables, aiming to enhance the performance of neural networks on the important data modalities.Graph Neural Networks (GNNs) have emerged as powerful architectures for learning and analyzing graph representations. However, the training of GNNs on large-scale datasets usually suffers from overfitting, posing significant generalization challenges for prediction problems. Meanwhile, conventional GNNs are hindered by scalability problem when deployed on industrial- level graph datasets. Moreover for the table reasoning task, Large Language Models (LLMs) have shown competitive ability, but cannot fully process large tables due to context limit and may fail to comprehend the complex relationships within tabular data. In this dissertation, we investigate algorithms and techniques to address the generalization and scalability issues of GNNs, as well as the effective and efficient approach to the table reasoning task. In the first work, we propose to leverage data augmentation to generalize GNNs. We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training. In the second and third work, we look into GNNs’ scalability problem. We propose VQ-GNN, a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. We further propose GOAT, a global graph transformer that scales to large graphs with millions of nodes and is competitive on tasks of both homophilious and heterophilious graphs. Lastly, we propose OpenTab, an effective method towards open-domain table reasoning task built with the advanced Large Language Models.
  • Item
    A Data Assimilation System for Lake Erie Based on the Local Ensemble Transform Kalman Filter
    (2024) Russell, David Scott; Ide, Kayo; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Data assimilation (DA) is the process by which a model forecast is adjusted to account for recent observations, taking into account both forecast and observation uncertainties. Although DA is common in numerical weather prediction (NWP) and other applications at global and regional scales, DA for large lakes such as North America's Great Lakes is still at an early stage of research and is not yet used operationally. In particular, the use of an ensemble-based approach to DA has scarcely been explored for large lakes, despite its growing popularity in operational NWP centers worldwide due to its dynamic estimation of the forecast covariance. Using Lake Erie as a test case, this study investigates the potential of ensemble DA to i) propagate forecast improvements throughout the lake and across forecast variables, and ii) inform the design of in-situ observing systems. The local ensemble transform Kalman filter (LETKF) is an efficient, localized, flexible variant of the ensemble Kalman filter (EnKF) that is used in multiple operational NWP centers. This work presents the development of a DA system for Lake Erie, which uses the LETKF to adjust forecasts of temperatures, currents, and water levels throughout the lake, using only lake surface temperature (LST) and temperature profile (TP) observations. The impact of both types of observations on all three forecast variables is evaluated within the framework of observing system simulation experiments (OSSEs), in which a DA system attempts to reconstruct a nature run (NR) by assimilating simulated observations of the NR. Observing system design questions are explored by comparing three different TP configurations. Assimilation of LST observations alone produces strong improvements to temperatures throughout the epilimnion (upper layer), while assimilation of TP observations extends these improvements to the hypolimnion (lower layer) near each profile. TP assimilation also shows improved representation of strong gyre currents and associated changes to thermocline depth and surface height, particularly when profiles sample from locations and depths where the thermal stratification in the forecast has been strongly affected by erroneous gyre currents. This work shows that the LETKF can be an efficient and effective tool for improving both forecasts and observing systems for large lakes, two essential ingredients in predicting the onset and development of economically and ecologically important phenomena such as harmful algal blooms (HABs) and hypoxia.
  • Item
    Real-World Security for At-Risk Users
    (2024) Warford, Noel Harris; Mazurek, Michelle L.; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Whether due to context or circumstance, "at-risk" users experience risk factors that augment or amplify their chances of being digitally attacked and/or suffering disproportionate harms in comparison to the so-called ``average user.''Significant prior scholarship over the past decade has surfaced a variety of varying needs and concerns for different \groups, from activists to survivors of intimate partner violence. However, when technology creators and researchers seek to develop tools to support these users, it can be difficult to reckon with the vastly varied, often contradictory, needs of different populations. To address this challenge, this thesis develops a framework for understanding at-risk user research, designed to help researchers and technology creators better guide technology design to support at-risk users.The framework includes ten contextual risk factors---such as prominence and marginalization---that impact the digital safety of these users and three broad sets of both technical and non-technical protective practices these users currently employ to meet their needs. This thesis then analyzes the current barriers at-risk users face when seeking to address their particular digital-safety challenges. In order to test the utility of this framework, this thesis explores the needs and concerns of two distinct at-risk populations through interview studies---library IT staff and entertainment journalists.Library IT staff work under severe resource constraints to serve patrons who often do not have access to computing resources outside of the library, or may need the particular protection of intellectual freedom libraries seek to provide. I observe that privacy and utility currently work in opposition to one another, but library IT staff in fact prioritize privacy over utility in accordance with strong library values. As another case study, entertainment journalists---those who cover movies, TV shows, sports, and other pop culture topics---work at the intersection of prominence and marginalization.Due to their profession, they must maintain a public presence, but that presence often exposes them to severe harassment, particularly if they also experience marginalization. These journalists conceptualize harassment as an unavoidable cost of their profession, but they perceived taking any action other than attempting to ignore it as largely fruitless. In line with prior scholarship, this thesis shows that institutional support---both from their employer as an organization and other colleagues---was essential to managing harassment response. This thesis demonstrates that understanding and accounting for particular contextual needs is crucial for improving digital safety for at-risk users. By working towards generalizing their needs and concerns, it will become easier to design both technical and non-technical solutions for both immediate and long-term challenges for at-risk populations.
  • Item
    Recognizing Object-Centric Attributes and Relations
    (2023) Pham, Khoi; Shrivastava, Abhinav; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Recognizing an object's visual appearance through its attributes, such as color and shape, and its relations to other objects in an environment, is an innate human ability that allows us to effortlessly interact with the world. This ability remains effective even when humans encounter unfamiliar objects or objects with appearances evolve over time, as humans can still identify them by discerning their attributes and relations. This dissertation aims to equip computer vision systems with this capability, empowering them to recognize object's attributes and relations to become more robust in handling real-world scene complexities. The thesis is structured into two main parts. The first part focuses on recognizing attributes for objects, an area where existing research is limited to domain-specific attributes or constrained by small-scale and noisy data. We overcome these limitations by introducing a comprehensive dataset for attributes in the wild, marked by challenges with attribute diversity, label sparsity, and data imbalance. To navigate these challenges, we propose techniques that address class imbalance, employ attention mechanism, and utilize contrastive learning for aligning objects with shared attributes. However, as such dataset is expensive to collect, we also develop a framework that leverages large-scale, readily available image-text data for learning attribute prediction. The proposed framework can effectively scale up to predict a larger space of attribute concepts in real-world settings, including novel attributes represented in arbitrary text phrases that are not encountered during training. We showcase various applications of the proposed attribute prediction frameworks, including semantic image search and object image tagging with attributes. The second part delves into the understanding of visual relations between objects. First, we investigate how the interplay of attributes and relations can improve image-text matching. Moving beyond the computationally expensive cross-attention network of previous studies, we introduce a dual encoder framework using scene graphs that is more efficient yet equally powerful on current image-text retrieval benchmark. Our approach can produce scene graph embeddings rich in attribute and relation semantics, which we show to be useful for image retrieval and image tagging. Lastly, we present our work in training large vision-language models on image-text data for recognizing visual relations. We formulate a new subject-centric approach that predicts multiple relations simultaneously conditioned on a single subject. Our approach is among the first work to learn from both weakly- and strongly-grounded image-text data to predict an extensive range of relationship classes.
  • Item
    An Approximation Framework for Large-Scale Spatial Games
    (2023) Hsiao, Vincent; Nau, Dana; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Game theoretic modeling paradigms such as Evolutionary Games and Mean Field Games (MFG) are used to model a variety of multi-agent systems in which the agents interact in a game theoretic fashion. These models seek to answer two questions: how to predict the forward dynamics of a population and how to control them. However, both modeling paradigms have unique issues that can make them difficult to analyze in closed form when applied to spatial domains. On one hand, spatial EGT models are difficult to evaluate mathematically and both simulations and approximations run into accuracy and tractability issues. On the other hand, MFG models are not typically formulated to handle domains where agents have strategies and physical locations. Furthermore, any MFG approach for controlling strategy evolution on spatial domains need also address the same accuracy and efficiency challenges in the evaluation of its forward dynamics as those faced by evolutionary game approaches. This dissertation presents a new modeling paradigm and approximation technique termed Bayesian-MFG for large-scale multi-agent games on spatial domains. The new framework lies at an intersection of techniques drawn from spatial evolutionary games, mean field games, and probabilistic reasoning. First, we describe our Bayesian network approximation technique for spatial evolutionary games to address the accuracy issues faced by lower order approximation methods. We introduce additional algorithms used to improve the computational efficiency of Bayesian network approximations. Alongside this, we describe our Pair-MFG model, a method for defining pair level approximate MFG for problems with distinct strategy and spatial components. We combine the pair-MFG model and Bayesian network approximations into a unified Bayesian-MFG framework. Using this framework, we present a method for incorporating Bayesian network approximations into a control problem framework allowing for the derivation of more accurate control policies when compared to existing MFG approaches. We demonstrate the effectiveness of our framework through its application to a variety of domains such as evolutionary game theory, reaction-diffusion equations, and network security.
  • Item
    Towards Effective Temporal Modeling for Video Understanding
    (2023) He, Bo; Shrivastava, Abhinav; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The exponential increase of video content available in the world has led to a growing need for advanced systems capable of autonomously analyzing and interpreting video data. Nowadays, video understanding has been a fundamental research topic in computer vision, focusing on the effective extraction and analysis of information from videos. Compared to the image modality, the video modality significantly differs from it with additional temporal dependencies, which provide crucial clues to help understand what happens across time. Therefore, how to effectively model temporal relationships is of vital importance for video understanding research. In this thesis, we aim to advance the field of temporal modeling, making video understanding systems more reliable, accurate, and flexible for various applications. In the first part, we introduce three strategies to model temporal dependencies in different downstream tasks and scenarios, including action recognition, temporal action localization, and video summarization. In the second part, we present a comprehensive large multimodal model for video understanding, constructed using recent advanced large language models. This model is capable of handling various video understanding tasks within a unified and integrated framework. Specifically, we first propose a Global Temporal Attention (GTA) network, which models global temporal relationships by the decoupled spatial and temporal attention operation. This approach significantly enhances the model's capability to recognize actions in videos with reduced computational cost. However, raw videos in the real world are untrimmed and contain many background frames, before correctly recognizing and classifying the action labels, we need to detect and localize the actions of interests in long untrimmed videos. Therefore, we introduce ASM-Loc framework to effectively localize action of interest under the weakly-supervised setting, which significantly reduces the need for labor-intensive manual labeling. Then, given that real-world data often comprises multiple modalities of information, such as video and text. We present A2Summ, which is aimed at tackling the challenge of summarizing both long video and text sequences with time correspondence, providing a solution for summarizing multimodal data. In the first part of this thesis, we focus on developing specialized models for individual video understanding tasks. Each model is specifically designed for a particular task, which limits their generalization ability to other areas and makes them less practical for diverse real-world applications. To address this limitation, in the second part of the thesis, we further present a unified large multimodal model, capable of handling multiple video understanding tasks. This model is built upon the foundation of powerful large language models, making it adaptable for a wide range of video understanding tasks, including video classification, online action prediction, video question answering, and video captioning. This method offers a more versatile and general solution, significantly enhancing the applicability of our models in real-world video understanding scenarios.