Theses and Dissertations from UMD
Permanent URI for this communityhttp://hdl.handle.net/1903/2
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
174 results
Search Results
Item Representation Learning for Reinforcement Learning: Modeling Non-Gaussian Transition Probabilities with a Wasserstein Critic(2024) Tse, Ryan; Zhang, Kaiqing; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Reinforcement learning algorithms depend on effective state representations when solving complex, high-dimensional environments. Recent methods learn state representations using auxiliary objectives that aim to capture relationships between states that are behaviorally similar, meaning states that lead to similar future outcomes under optimal policies. These methods learn explicit probabilistic state transition models and compute distributional distances between state transition probabilities as part of their measure of behavioral similarity. This thesis presents a novel extension to several of these methods that directly learns the 1-Wasserstein distance between state transition distributions by exploiting the Kantorovich-Rubenstein duality. This method eliminates parametric assumptions about the state transition probabilities while providing a smoother estimator of distributional distances. Empirical evaluation demonstrates improved sample efficiency over some of the original methods and a modest increase in computational cost per sample. The results establish that relaxing theoretical assumptions about state transition modeling leads to more flexible and robust representation learning while maintaining strong performance characteristics.xItem OUT OF DISTRIBUTION EVALUATION OF NATURAL LANGUAGE PROCESSING SYSTEMS: GENERALIZATION TO LOW-RESOURCE AND DISTANT LANGUAGES AND HUMAN-AI COLLABORATIVE WRITING(2024) Richburg, Aquia; Carpuat, Marine; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Large language models have revolutionized natural language processing with their capabilities in text generation and understanding. Their rich contextual representations learned from training on diverse text datasets have lead LLMs to be used across a variety of settings. However this increases the chance of models being used in unintended use cases and causing harm to users. This dissertation delves into empirical studies of out-of-distribution issues in text generation (machine translation) and text classification (authorship analysis) tasks, examining how LLMs perform in settings distant from their training distributions.In our first work, the goal is to understand the characteristics of the training distribution of LLMs by visualizing the roles of samples during the training of a machine translation model. Our results indicate that sample contributions are not uniform and play complex roles throughout the training process. This highlights the difficulty of describing samples that are representative of the training distribution and motivates thorough evaluation of models in diverse settings. Our second and third works turn to the evaluation of LLMs in out-of-distribution settings to better understand their strengths and limitations for generalization on unseen tasks. We evaluate LLMs in machine translation tasks, focusing on how translation quality is affected by the presence or absence of specific language pairs in the training data. Our findings show that while finetuning improves translation for unseen languages, the impact varies across different language pairs. This emphasizes the need for further research to enable effective massively multilingual translation with LLMs. In text classification, we explore out-of-distribution generalization for authorship analysis in the context of human-AI collaborative writing. Our studies reveal that traditional AI detection models underperform when distinguishing between human and AI cowritten text. Simpler n-gram techniques are more robust than LLM for authorship identification, suggesting the need for adapted authorship analysis tools. In summary this dissertation advances our understanding of LLM generalization and provides insights for improving the robustness and adaptability of NLP systems.Item Learning in Large Multi-Agent Systems(2024) Kara, Semih; Martins, Nuno C; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In this dissertation, we study a framework of large-scale multi-agent strategic interactions. The agents are nondescript and use a learning rule to repeatedly revise their strategies based on their payoffs. Within this setting, our results are structured around three main themes: (i) Guaranteed learning of Nash equilibria, (ii) The inverse problem, i.e. estimating the payoff mechanism from the agents' strategy choices, and (iii) Applications to the placement of electric vehicle charging stations. In the traditional setup, the agents' inter-revision times follow identical and independent exponential distributions. We expand on this by allowing these intervals to depend on the agents' strategies or have Erlang distributions. These extensions enhance the framework's modeling capabilities, enabling it to address problems such as task allocation with varying service times or multiple stages. We also explore a third generalization, concerning the accessibility among strategies. Majority of the existing literature assume that the agents can transition between any two strategies, whereas we allow only certain alternatives to be accessible from certain others. This adjustment further improves the framework's modeling capabilities, such as by incorporating constraints on strategy switching related to spatial and informational factors. For all of these extensions, we use Lyapunov's method and passivity-based techniques to find conditions on the revision rates, learning rule, and payoff mechanism that ensure the agents learn to play a Nash equilibrium of the payoff mechanism. For our second class of problems, we adopt a multi-agent inverse reinforcement learning perspective. Here, we assume that the learning rule is known but, unlike in existing work, the payoff mechanism is unknown. We propose a method to estimate the unknown payoff mechanism from sample path observations of the populations' strategy profile. Our approach is two-fold: We estimate the agents' strategy transitioning probabilities, which we then use - along with the known learning rule - to obtain a payoff mechanism estimate. Our findings regarding the estimation of transitioning probabilities are general, while for the second step, we focus on linear payoff mechanisms and three well-known learning rules (Smith, replicator, and Brown-von Neumann-Nash). Additionally, under certain assumptions, we show that we can use the payoff mechanism estimate to predict the Nash equilibria of the unknown mechanism and forecast the strategy profile induced by other rules. Lastly, we contribute to a traffic simulation tool by integrating electric vehicles, their charging behaviors, and charging stations. This simulation tool is based on spatial-queueing principles and, although less detailed than some microscopic simulators, it runs much faster and accurately represents traffic rules. Using this tool, we identify optimal charging station locations (on real roadway networks) that minimize the overall traffic.Item Second Wave Mechanics(2024) Fabbri, Anthony; Herrmann, Jeffrey W; Mechanical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The COVID-19 pandemic experienced very well-documented "waves" of the virus's progression, which can be analyzed to predict future wave behavior. This thesis describes a data analysis algorithm for analyzing pandemic behavior and other, similar problems. This involves splitting the linear and sinusoidal elements of a pandemic in order to predict the behavior of future "waves" of infection from previous "waves" of infection, creating a very long-term prediction of a pandemic. Common wave shape patterns can also be identified, to predict the pattern of mutations that have recently occurred, but have not become popularly known as yet, to predict the remaining future outcome of the wave. By only considering the patterns in the data that could possibly have acted in tandem to generate the observed results, many false patterns can be eliminated, and, therefore, hidden variables can be estimated to a very high degree of probability. Similar mathematical relationships can reveal hidden variables in other underlying differential equations.Item MATHEMATICS OF THE DYNAMICS AND CONTROL OF THE SARS-COV-2 PANDEMIC(2024) Pant, Binod; Gumel, Abba B.; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The pneumonia-like illness that emerged late in 2019, caused by SARS-CoV-2 (and coined COVID-19), became the greatest public health challenge humans have faced since the 1918/1919 influenza pandemic, causing over 670 million confirmed cases and 7 million fatalities globally. This dissertation contributes in providing deep qualitative insights and understanding on the transmission dynamics and control of the pandemic, using mathematical modeling approaches together with data analytics and computation. Specifically, it addresses some of the pertinent challenges associated with modeling the dynamics of the disease, notably the disproportionate effect of the disease on certain (risk and demographic) populations (inducing various heterogeneities) and behavior changes with respect to adherence or lack thereof to interventions. An $m-$group model, which monitors the temporal dynamics of the disease in m heterogeneous populations, was designed and used to study the impact of age heterogeneity and vaccination on the spread of the disease in the United States. For instance, the disease-free equilibrium for the case of the model with m=1 (i.e., the model with a homogeneous population) was shown to be globally-asymptotically stable for two special cases (when vaccine is perfect or when disease-induced mortality is negligible) whenever the associated reproduction number of the homogeneous model is less than one. The homogeneous model has a unique endemic equilibrium whenever the reproduction threshold exceeds unity (this equilibrium was shown to be globally-asymptotically stable for a special case, using a nonlinear Lyapunov function of Goh-Volterra type). The homogeneous model was fitted to the observed cumulative mortality data for the SARS-CoV-2 pandemic in the United States during the period from January to May of 2022 (when Omicron was the predominant variant). It was shown that vaccine-derived herd immunity (needed to eliminate the disease) cannot be attained using the homogeneous model regardless of the proportion of individuals fully vaccinated. Such vaccine-derived immunity can, however, be achieved using the $m$-group heterogeneous model, with $m=2$ (where the total population is split into two groups: those under 65 years of age, and those 65 years and older), if at least 61\% of the susceptible population is fully vaccinated. Thus, this dissertation shows that heterogeneity reduces the level of vaccine coverage needed to eliminate the pandemic (and models that do not account for heterogeneity may be over-estimating the vaccination coverage needed to achieve herd immunity in the community). To quantify the impact of human behavior changes on the spread and control of the pandemic, we designed a novel behavior-epidemiology model which considers numerous metrics for inducing human behavior changes (such as current level of disease burden and intervention adherence fatigue). Unlike the equivalent model without human behavior explicitly incorporated, the behavior-epidemiology model fits the observed cumulative mortality and predicts the observed daily mortality data very well. It was also shown that the behavior metrics related to the level of SARS-CoV-2 mortality and symptomatic transmission were more influential in inducing positive behavior changes than all other behavior metrics considered. Finally, a model was developed to assess the utility of wastewater surveillance to study the transmission dynamics and control of SARS-CoV-2 in a community. Specifically, we developed and calibrated a wastewater-based epidemiology model using wastewater data from Miami-Dade county, Florida, during the third wave of the SARS-CoV-2 pandemic. The model showed a strong correlation between the observed (detected) weekly case data and the corresponding weekly data predicted by the calibrated model. The model's prediction of the week when maximum number of SARS-CoV-2 cases will be recorded in the county during the simulation period precisely matched the time when the maximum observed/reported cases were recorded (August 14, 2021). Furthermore, the model's projection of the maximum number of cases for the week of August 14, 2021 was about 15 times higher than the maximum observed weekly case count for the county on that day (i.e., the maximum case count estimated by the model was 15 times higher than the actual/observed count for confirmed cases). In addition to being in line with other modeling studies, this result is consistent with the CDC estimate that the reported confirmed case data may be 10 times lower than the actual (since the confirmed data did not account for asymptomatic and presymptomatic transmission). Furthermore, the model accurately predicted a one-week lag between the peak in weekly COVID-19 case and hospitalization data during the time period of the study in Miami-Dade, with the model-predicted hospitalizations peaking on August 21, 2021. Detailed time-varying global sensitivity analysis was carried out to determine the parameters (wastewater-based, epidemiological and biological) that have the most influence on the chosen response function (namely, the cumulative viral load in the wastewater). This analysis identified key parameters that significantly affect the value of the response function (hence, they should be targeted for intervention). This dissertation conclusively showed that wastewater surveillance data can be a very powerful indicator for measuring (i.e., providing early-warning signal and current burden) and predicting the future trajectory and burden (e.g., number of cases and hospitalizations) of emerging and re-emerging infectious diseases, such as SARS-CoV-2, in a community.Item Quantum Algorithms for Nonconvex Optimization: Theory and Implementation(2024) Leng, Jiaqi; Wu, Xiaodi XW; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Continuous optimization problems arise in virtually all disciplines of quantitative research. While convex optimization has been well-studied in recent decades, large-scale nonconvex optimization problems remain intractable in both theory and practice. Quantum computers are expected to outperform classical computers in certain challenging computational problems. Some quantum algorithms for convex optimization have shown asymptotic speedups, while the quantum advantage for nonconvex optimization is yet to be fully understood. This thesis focuses on Quantum Hamiltonian Descent (QHD), a quantum algorithm for continuous optimization problems. A systematic study of Quantum Hamiltonian Descent is presented, including theoretical results concerning nonconvex optimization and efficient implementation techniques for quantum computers. Quantum Hamiltonian Descent is derived as the path integral of classical gradient descent algorithms. Due to the quantum interference of classical descent trajectories, Quantum Hamiltonian Descent exhibits drastically different behavior from classical gradient descent, especially for nonconvex problems. Under mild assumptions, we prove that Quantum Hamiltonian Descent can always find the global minimum of an unconstrained optimization problem given a sufficiently long runtime. Moreover, we demonstrate that Quantum Hamiltonian Descent can efficiently solve a family of nonconvex optimization problems with exponentially many local minima, which most commonly used classical optimizers require super-polynomial time to solve. Additionally, by using Quantum Hamiltonian Descent as an algorithmic primitive, we show a quadratic oracular separation between quantum and classical computing. We consider the implementation of Quantum Hamiltonian Descent for two important paradigms of quantum computing, namely digital (fault-tolerant) and analog quantum computers. Exploiting the product formula for quantum Hamiltonian simulation, we demonstrate that a digital quantum computer can implement Quantum Hamiltonian Descent with gate complexity nearly linear in problem dimension and evolution time. With a hardware-efficient sparse Hamiltonian simulation technique known as Hamiltonian embedding, we develop an analog implementation recipe for Quantum Hamiltonian Descent that addresses a broad class of nonlinear optimization problems, including nonconvex quadratic programming. This analog implementation approach is deployed on large-scale quantum spin-glass simulators, and the empirical results strongly suggest that Quantum Hamiltonian Descent has great potential for highly nonconvex and nonlinear optimization tasks.Item CUR Matrix Approximation Through Convex Optimization(2024) Linehan, Kathryn; Balan, Radu V; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In this dissertation we present work on the CUR matrix approximation. Specifically, we present 1) an approximation of the proximal operator of the L-infinity norm using a neural network, 2) a novel deterministic CUR formulation and algorithm, and 3) a novel application of CUR as a feature selection method to determine discriminant proteins when clustering protein expression data in a self-organizing map (SOM). The proximal operator of the L-infinity norm arises in our CUR algorithm. Since the computation of the proximal operator of the L-infinity norm requires a sort of the input data (or at least a partial sort similar to quicksort), we present a neural network to approximate the proximal operator. A novel aspect of the network is that it is able to accept vectors of varying lengths due to a feature selection process that uses moments of the input data. We present results on the accuracy of the approximation, feature importance, and computational efficiency of the approach, and present an algorithm to calculate the proximal operator of the L-infinity norm exactly, relate it to the Moreau decomposition, and compare its computational efficiency to that of the approximation. Next, we present a novel deterministic CUR formulation that uses convex optimization to form the matrices C and R, and a corresponding algorithm that uses bisection to ensure that the user selected number of columns appear in C and the user selected number of rows appear in R. We implement the algorithm using the surrogate functional technique of Daubechies et al. [Communications on Pure and Applied Mathematics, 57.11 (2004)] and extend the theory of this approach to apply to our CUR formulation. Numerical results are presented that demonstrate the effectiveness of our CUR algorithm as compared to the singular value decomposition (SVD) and other CUR algorithms. Last, we use our CUR approximation as a feature selection method in the application by Higuera et al. [PLOS ONE, 10(6) (2015)] to determine discriminant proteins when clustering protein expression data in an SOM. This is a novel application of CUR and to the best of our knowledge, this is the first use of CUR on protein expression data. We compare the performance of our CUR algorithm to other CUR algorithms and the Wilcoxon rank-sum test (the original feature selection method in the work).Item EFFICIENT COMPUTATIONAL ALGORITHMS FOR MAGNETIC EQUILIBRIUM IN A FUSION REACTOR(2024) Liang, Jiaxing; Elman, Howard C.; Sanchez-Vizuet, Tonatiuh; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In a magnetic confinement fusion reactor, like a Tokamak, hydrogen isotopes are injected into the chamber and heated to form a plasma. The hot plasma tends to expand, so it's crucial to confine it within the center of the reactor to prevent contact with the reactor walls. This confinement is achieved through magnetic fields generated by currents running through external coils surrounding the reactor. However, these currents may suffer from uncertainties, which could arise from factors like temperature fluctuations and material impurities. These variables introduce a level of unpredictability in the plasma's behavior and the overall stability of the fusion process. This thesis aims to investigate the impact that stochasticity in the current intensities has on the confinement properties of the plasma and to estimate the expected behavior of the magnetic field. While the focus is on the variability in current intensities, the tools developed can be applied to other sources of uncertainty, such as the positioning of coils and the source term parameters. To quantify the variability in model predictions and to evaluate the statistical properties of solutions over a range of parameter values, traditional sampling methods like Monte Carlo, often require intensive and expensive nonlinear computations. To tackle this challenge, we propose three approaches. Firstly, we focus on the development and application of a surrogate function, constructed via a stochastic collocation approach on a sparse grid in the parameter space. This surrogate function is employed to replace the nonlinear solution in Monte Carlo sampling processes. For our numerical experiments, we evaluate the efficiency and accuracy of the outcomes produced by the surrogate, in comparison with those obtained through direct nonlinear solutions. Our findings indicate that a carefully selected surrogate function reduces the sampling cost -- achieving acceleration factors ranging from 7 to over 30 -- while preserving the accuracy of the results. The second part of the thesis explores the multilevel Monte Carlo approach, investigating its potential for cost savings compared to simple Monte Carlo. This method involves conducting the majority of computations on a sequence of coarser spatial grids compared to what a simple Monte Carlo simulation would typically use. We examine this approach with non-linear computation, using both uniformly refined meshes and adaptively refined grids guided by a discrete error estimator. Numerical experiments reveal substantial cost reductions achieved through multilevel methods, typically ranging from a factor of 60 to exceeding 200. Adaptive gridding results in more accurate computation of relevant geometric parameters. In the last part of this work, we explore hybridmethods that integrate surrogates with multilevel Monte Carlo to further reduce the sampling cost. We establish the optimal construction and sampling costs for the surrogate-based multilevel Monte Carlo. Numerical results demonstrate that surrogate-based multilevel Monte Carlo remarkably reduces the computational burden, requiring only 0.1 to 14 seconds for a target relative mean square error ranging from $8\times 10^{-3}$ to $2\times10^{-4}$, reducing the cost of direct computation by factors of 50 to 300. In terms of accuracy, the surrogate-based sampling results exhibit close congruence with those obtained via direct computation, both in plasma boundary and geometric descriptors. The primary contributions of our work entail the application of stochastic collocation techniques and multilevel Monte Carlo methods to analyze plasma behavior under uncertainties in current within fusion reactors. Furthermore, we establish the universal sampling cost for the surrogate-enhanced multilevel Monte Carlo approach. Our methodology presents a paradigm in how we approach and manage computational challenges in this field.Item Structured discovery in graphs: Recommender systems and temporal graph analysis(2024) Peyman, Sheyda Do'a; Lyzinski, Vince V.; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Graph-valued data arises in numerous diverse scientific fields ranging from sociology, epidemiology and genomics to neuroscience and economics.For example, sociologists have used graphs to examine the roles of user attributes (gender, class, year) at American colleges and universities through the study of Facebook friendship networks and have studied segregation and homophily in social networks; epidemiologists have recently modeled Human-nCov protein-protein interactions via graphs, and neuroscientists have used graphs to model neuronal connectomes. The structure of graphs, including latent features, relationships between the vertex and importance of each vertex are all highly important graph properties that are main aspects of graph analysis/inference. While it is common to imbue nodes and/or edges with implicitly observed numeric or qualitative features, in this work we will consider latent network features that must be estimated from the network topology.The main focus of this text is to find ways of extracting the latent structure in the presence of network anomalies. These anomalies occur in different scenarios: including cases when the graph is subject to an adversarial attack and the anomaly is inhibiting inference, and in the scenario when detecting the anomaly is the key inference task. The former case is explored in the context of vertex nomination information retrieval, where we consider both analytic methods for countering the adversarial noise and also the addition of a user-in-the-loop in the retrieval algorithm to counter potential adversarial noise. In the latter case we use graph embedding methods to discover sequential anomalies in network time series.Item The Shuffling Effect: Vertex Label Error’s Impact on Hypothesis Testing, Classification, and Clustering in Graph Data(2024) Saxena, Ayushi; Lyzinski, Vince; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The increasing prevalence of graph and network-valued data across various disciplines has prompted significant interest and research in recent years. This dissertation explores the impact of vertex shuffling, or vertex misalignment, on the statistical network inference tasks of hypothesis testing, classification, and clustering. Our focus is within the framework of multiple network inference, where existing methodologies often assume known vertex correspondence across networks. This assumption frequently does not hold in practice. Through theoretical analyses, simulations, and experiments, we aim to reveal the effects of vertex shuffling on different types of performance.Our investigation begins with an examination of two-sample network hypothesis testing, focusing on the decrease in statistical power resulting from vertex shuffling. In this work, our analysis focuses on the random dot product and stochastic block model network settings. Subsequent chapters delve into the effects of shuffling on graph classification and clustering, showcasing how misalignment negatively impacts accuracy in categorizing and clustering graphs (and vertices) based on their structural characteristics. Various machine learning algorithms and clustering methodologies are explored, revealing a theme of consistent performance degradation in the presence of vertex shuffling. We also explore how graph matching algorithms can potentially mitigate the effects of vertex misalignment and recover the lost performance. Our findings also highlight the risk of graph matching as a pre-processing tool, as it can induce artificial signal. These findings highlight the difficulties and subtleties of addressing vertex shuffling across multiple network inference tasks and suggest avenues for future research in order to enhance the robustness of statistical inference methodologies in complex network environments.