UMD Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/3

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a given thesis/dissertation in DRUM.

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 10 of 17
  • Thumbnail Image
    Item
    Learning Autonomous Underwater Navigation with Bearing-Only Data
    (2024) Robertson, James; Duraiswami, Ramani; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Recent applications of deep reinforcement learning in controlling maritime autonomoussurface vessels have shown promise for integration into maritime transportation. These could have the potential to reduce at-sea incidents such as collisions and groundings which are majorly attributed to human error. With this in mind the goal of this work is to evaluate how well a similar deep reinforcement learning agent could perform the same task in submarines but using passive SONAR rather than the ranging data provided by active RADAR aboard surface vessels. A simulated submarine outfitted with a passive spherical, hull-mounted SONAR sensor is placed into contact scenarios under the control of a reinforcement learning agent and directed to make its way to a navigational waypoint while avoiding interfering surface vessels. In order to see how this best translates to lower power autonomous vessels (vice warship submarines), no estimation for the range of the surface vessels is maintained in order to cut down on computing requirements. Inspired by my time aboard U.S. Navy submarines, the agent is provided with simply the simulated passive SONAR data. I show that this agent is capable of navigating to a waypoint while avoiding crossing, overtaking, and head-on surface vessels and thus could provide a recommended course to a submarine contact management team in ample time since the maneuvers made by the agent are not instantaneous in contrast to the assumptions of traditional target tracking with bearing-only data. Additionally, an in-progress plugin for Epic Games’ Unreal Engine is presented with the ability to simulate underwater acoustics inside the 3D development software. Unreal Engine is a powerful 3D game engine that is incredibly flexible and capable of being integrated into many different forms of scientific research. This plugin could provide researchers with the ability to conduct useful simulations in intuitively designed 3D environments.
  • Thumbnail Image
    Item
    Learning-based Motion Planning for High-DoF Robot Systems
    (2023) Jia, Biao; Manocha, Dinesh; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    A high-degree-of-freedom (DoF) robot system refers to a type of robotic system that possesses many independently controllable mechanical degrees of freedom. This includes high-DoF robots or objects being manipulated, such as flexible robotic arms and flexible objects. Degrees of freedom in robotics represent the different ways a robot can move or manipulate its parts. High-DoF robot systems have a significant number of these independent motions, allowing them to exhibit complex and versatile movements and behaviors. These systems are employed in various applications, including manufacturing and healthcare, where precise and flexible control is essential. The main difficulty associated with high-DoF robot systems is the complexity arising from their numerous degrees of freedom. Calculating the optimal trajectories or control inputs for high-DoF systems can be computationally intensive. The sheer number of variables and the need for real-time responsiveness pose significant challenges in terms of computation and control. In some cases, high-DoF robot systems interact with deformable objects such as fabrics and foam. Modeling and controlling these objects add additional layers of complexity due to their dynamic and unpredictable behavior. To address these challenges, we delve into several key areas: Object Deformation Modeling, Controller Parameterization, System Identification, Control Policy Learning, and Sim-to- Real Transfer. We begin by using cloth manipulation as an example to illustrate how to model high-DoF objects and design mapping relationships. By leveraging computer vision and visual feedback-based controllers, we enhance the ability to model and control objects with substantial shape variations, which is particularly valuable in applications involving deformable materials. Next, we shift our focus to Controller Parameterization, aiming to define control parameters for high-DoF objects. We employ a random forest-based controller along with imitation learning, resulting in more robust and efficient controllers, which are essential for high-DoF robot systems. This method can be used for human-robot collaboration involving flexible objects and enables imitation learning to converge in as few as 4-5 iterations. Furthermore, we explore how to reduce the dimensionality of both high-degree-of-freedom (high-DoF) robot systems and objects simultaneously. Our system allows for the more effective use of computationally intensive methods like reinforcement learning (RL) or trajectory optimization. Therefore, we design a system identification method to reduce the need for repeated rendering or experiments, significantly improving the efficiency of RL. This enables some algorithms with exponential computational complexity to be solved in linear time. In this part of the work, we adopt a real setup where humans and robots collaborate in real-time to manipulate flexible objects. In the second part of our research, we focus on the task of natural media painting. We utilize reinforcement learning techniques. Painting itself can be considered a high-DoF robot system, as it entails a multitude of context-dependent actions to complete the task. Our objective is to replicate a reference image using brush strokes, with the goal encoded through observations. We will focus on how to address the sparse reward distribution with a large continuous action space. Additionally, we investigate the practicality of transferring learned policies from simulated environments to real-world scenarios, with a specific focus on tasks like painting. This research bridges the gap between simulation and practical application, ensuring that the knowledge gained from our work can be effectively utilized in real-world settings. Ultimately, we will demonstrate the use of RL-learned painting strategies in both virtual and real robot environments.
  • Thumbnail Image
    Item
    Scalable Methods for Robust Machine Learning
    (2023) Levine, Alexander Jacob; Feizi, Soheil; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In recent years, machine learning systems have been developed that demonstrate remarkable performance on many tasks. However, naive metrics of performance, such as the accuracy of a classifier on test samples drawn from the same distribution as the training set, can provide an overly optimistic view of the suitability of a model for real-world deployment. In this dissertation, we develop models that are robust, in addition to performing well on large-scale tasks. One notion of robustness is adversarial robustness, which characterizes the performance of models under adversarial attacks. Adversarial attacks are small, often imperceptible, distortions to the inputs of machine learning systems which are crafted to substantially change the output of the system. These attacks represent a real security threat, and are especially concerning when machine learning systems are used in safety-critical applications. To mitigate this threat, certifiably robust classification techniques have been developed. In a certifiably robust classifier, for each input sample, in addition to a classification, the classifier also produces a certificate, which is a guaranteed lower bound on the magnitude of any perturbation required to change the classification. Existing methods for certifiable robustness have significant limitations, which we address in Parts I and II of this dissertation: (i) Currently, randomized smoothing techniques are the only certification techniques that are viable for large-scale image classification (i.e. ImageNet). However, randomized smoothing techniques generally provide only high-probability, rather than exact, certificate results. To address this, we develop deterministic randomized smoothing-based algorithms, which produce exact certificates with finite computational costs. In particular, in Part I of this dissertation, we present to our knowledge the first deterministic, ImageNet-scale certification methods under the L_1, L_p (for p < 1), and "L_0" metrics. (ii) Certification results only apply to particular metrics of perturbation size. There is therefore a need to develop new techniques to provide provable robustness against different types of attacks. In Part II of this dissertation, we develop randomized smoothing-based algorithms for several new types of adversarial perturbation, including Wasserstein adversarial attacks, Patch adversarial attacks, and Data Poisoning attacks. The methods developed for Patch and Poisoning attacks are also deterministic, allowing for efficient exact certification. In Part III of this dissertation, we consider a different notion of robustness: test-time adaptability to new objectives in reinforcement learning. This is formalized as goal-conditioned reinforcement learning (GCRL), in which each episode is conditioned by a new "goal," which determines the episode's reward function. In this work, we explore a connection between off-policy GCRL and knowledge distillation, which leads us to apply Gradient-Based Attention Transfer, a knowledge distillation technique, to the Q-function update. We show, empirically and theoretically, that this can improve the performance of off-policy GCRL when the space of goals is high-dimensional.
  • Thumbnail Image
    Item
    The Learning and Usage of Second Language Speech Sounds: A Computational and Neural Approach
    (2023) Thorburn, Craig Adam; Feldman, Naomi H; Linguistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Language learners need to map a continuous, multidimensional acoustic signal to discrete abstract speech categories. The complexity of this mapping poses a difficult learning problem, particularly for second language learners who struggle to acquire the speech sounds of a non-native language, and almost never reach native-like ability. A common example used to illustrate this phenomenon is the distinction between /r/ and /l/ (Goto, 1971). While these sounds are distinct in English and native English speakers easily distinguish the two sounds, native Japanese speakers find this difficult, as the sounds are not contrastive in their language. Even with much explicit training, Japanese speakers do not seem to be able to reach native-like ability (Logan, Lively, & Pisoni, 1991; Lively, Logan & Pisoni, 1993) In this dissertation, I closely explore the mechanisms and computations that underlie effective second-language speech sound learning. I study a case of particularly effective learning--- a video game paradigm where non-native speech sounds have functional significance (Lim & Holt, 2011). I discuss the relationship with a Dual Systems Model of auditory category learning and extend this model, bringing it together with the idea of perceptual space learning from infant phonetic learning. In doing this, I describe why different category types are better learned in different experimental paradigms and when different neural circuits are engaged. I propose a novel split where different learning systems are able to update different stages of the acoustic-phonetic mapping from speech to abstract categories. To do this I formalize the video game paradigm computationally and implement a deep reinforcement learning network to map between environmental input and actions. In addition, I study how these categories could be used during online processing through an MEG study where second-language learners of English listen to continuous naturalistic speech. I show that despite the challenges of speech sound learning, second language listeners are able to predict upcoming material integrating different levels of contextual information and show similar responses to native English speakers. I discuss the implications of these findings and how the could be integrated with literature on the nature of speech representation in a second language.
  • Thumbnail Image
    Item
    Multi-Agent Reinforcement Learning: Systems for Evaluation and Applications to Complex Systems
    (2023) Terry, Jordan; Dickerson, John; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Reinforcement learning is a field of artificial intelligence that studies methods for agents to learn by trial and error to take actions in a given system. Famous examples of it have included learning to control real robots, or achieving superhuman performance in most of the most popular and challenging games for humans. In order to conduct research in this space, researchers use standardized "environments", such as robotics simulations or video games, to evaluate the performance of learning methods. This thesis covers PettingZoo, a library that offers a standardized API and set of reference environments for multi-agent reinforcement learning that's become widely used, SuperSuit, a library that offers a easy-to-use standardized preprocessing wrappers for interfacing with learning libraries, and extensions to the Arcade Learning Environment (a popular tool which reinforcement learning researchers use to interact with Atari 2600 games) that allows for supporting multiplayer game modes. Using these tools, this thesis also uses multi-agent reinforcement learning to develop a new tool for natural science research. Emergent behaviors refer to the coordinated behaviors of groups of agents such as pedestrians in a crosswalk, birds in flocking formations, cars in traffic or traders in the stock market, and represent some of the most important things that we generally don't understand across many fields of science. In this work, we introduce the first mathematical formalism for the systematic search of all possible good ("mature") emergent behaviors within a multi-agent system through multi-agent reinforcement learning (MARL), and create a naive implementation of this search via deep reinforcement learning that can be applied in arbitrary environments. We show that in 12 multi-agent systems, this naive method is able to find over a hundred total emergent behaviors, the majority of which were previously unknown to the environment authors. Such methods could allow for answering various types of open scientific questions, such as "What behaviors are possible in this system", "What specific conditions in this system allow for this kind of emergent behavior", or "How can I change this system to prevent this emergent behavior."
  • Thumbnail Image
    Item
    Efficient Environment Sensing and Learning for Mobile Robots
    (2022) Suryan, Varun; Tokekar, Pratap; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Data-driven learning is becoming an integral part of many robotic systems. Robots can be used as mobile sensors to learn about the environment in which they operate. Robots can also seek to learn essential skills, such as navigation, within the environment. A critical challenge in both types of learning is sample efficiency. Acquiring samples with physical robots can be prohibitively time-consuming. As a result, when applying learning techniques in robotics that require physical interaction with the environment, minimizing the number of such interactions becomes a key. The key question we seek to answer is: How do we make robots learn efficiently with a minimal amount of physical interaction? We approach this question along two fronts: extrinsic learning and intrinsic learning. In extrinsic learning, we want the robot to learn about the external environment in which it is operating. In intrinsic learning, our focus is on the robot to learn a skill using reinforcement learning (RL) such as navigating in an environment. In this dissertation, we develop algorithms that carefully plan where the robots obtain samples in order to efficiently perform intrinsic and extrinsic learning. In particular, we exploit the structural properties of Gaussian Process (GP) regression to design efficient sampling algorithms. We study two types of problems under extrinsic learning. We start with the problem of learning a spatially varying field modeled by a GP efficiently. Our goal is to ensure that the GP posterior variance, which is also the mean square error between the learned and actual fields, is below a predefined value. By exploiting the underlying properties of GP, we present a series of constant-factor approximation algorithms for minimizing the number of stationary sensors to place, minimizing the total time taken by a single robot, and minimizing the time taken by a team of robots to learn the field. Here, we assume that the GP hyperparameters are known. We then study a variant where our goal is to identify the hotspot in an environment. Here we do not assume that hyperparameters are unknown. For this problem, we present Upper Confidence Bound (UCB) and Monte Carlo Tree Search (MCTS) based algorithms for a single robot and later extend them to decentralized multi-robot teams. We also validate their performance on real-world datasets. For intrinsic learning, our aim is to reduce the number of physical interactions by leveraging simulations often known as Multi-Fidelity Reinforcement Learning (MFRL). In the MFRL framework, an agent uses multiple simulators of the real environment to perform actions. We present two MFRL framework versions, model-based and model-free, that leverage GPs to learn the optimal policy in a real-world environment. By incorporating GPs in the MFRL framework, we empirically observe a significant reduction in the number of samples for model-based and model-free learning.
  • Thumbnail Image
    Item
    SENSING AND CONTROL UNDER RESOURCE CONSTRAINTS AND UNCERTAINTY: RISK NEUTRAL AND RISK SENSITIVE APPROACHES
    (2022) Hartman, David; Baras, John S; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In network estimation and control systems like sensor networks or industrial robotic systems,there are often restrictions or uncertainties that must be taken into account. For example, there are often bandwidth and communication constraints on the estimators or controllers. Additionally, the dynamics model is not always known. Lastly, noise or exogenous disturbances can adversely affect your system. This thesis addresses three problems in sensing and control in both the H2 and risk-sensitivecontrol setting. The first problem stems from restrictions on the communications and battery life of sensors. Because of these restrictions, when estimating a state in a system we must cleverly schedule which sensors can be active. The second problem also stems from communication restrictions. In this setting, the sensors and actuators can only communicate with a small number of neighboring sensors. Therefore, we must solve a distributed control problem. The third problem stems from the dynamics of a system being unknown. In this regard, we must solve a control problem using simulated data instead of a fixed model. The research in this thesis, utilizes tools from optimization, estimation, control, and dynamic programming.
  • Thumbnail Image
    Item
    EXPERT-IN-THE-LOOP FOR SEQUENTIAL DECISIONS AND PREDICTIONS
    (2021) Brantley, Kiante; Daumé III, Hal; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Sequential decisions and predictions are common problems in natural language processing, robotics, and video games. Essentially, an agent interacts with an environment to learn how to solve a particular problem. Research in sequential decisions and predictions has increased due in part to the success of reinforcement learning. However, this success has come at the cost of algorithms being very data inefficient, making learning in the real world difficult. Our primary goal is to make these algorithms more data-efficient using an expert in the loop (e.g., imitation learning). Imitation learning is a technique for using an expert in sequential decision and prediction problems. Naive imitation learning has a covariate shift problem (i.e., training distribution differs from test distribution). We propose methods and ideas to address this issue and address other issues that arise in different styles of imitation learning. In particular, we study three broad areas of using an expert in the loop for sequential decisions and predictions. First, we study the most popular category of imitation learning, interactive imitation learning. Although interactive imitation learning addresses issues around the covariate shift problem in naive imitation, it does this with a trade-off. Interactive imitation learning assumesaccess to an online interactive expert, which is unrealistic. Instead, we propose a setting where this assumption is realistic and attempt to reduce the amount of queries needed for interactive imitation learning. We further introduce a new category on imitation learning algorithm called, Reward- Learning Imitation learning. Unlike interactive imitation learning, these algorithms only address the covariate shift using demonstration data instead of querying an online interactive expert. This category of imitation learning algorithms assumes access to an underlying reinforcement learning algorithm, that can optimize a reward function learned from demonstration data. We benchmark all algorithms in this category and relate them to modern structured prediction NLP problems. Beyond reward-learning imitation learning and interactive imitation, some problems cannot be naturally expressed and solved using these two categories of algorithms. For example, learning an algorithm that solves a particular problem and also satisfies safety constraints. We introduce expert-in-the-loop techniques that extend beyond traditional imitation learning paradigms, where an expert provides demonstration features or constraints, instead of state-action pairs.
  • Thumbnail Image
    Item
    THE ROLE OF THE VENTRAL STRIATUM AND AMYGDALA IN REINFORCEMENT LEARNING
    (2021) Taswell, Craig Anthony; Butts , Daniel; Averbeck , Bruno; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Adaptive behavior requires that organisms choose wisely to gain rewards and avoid punishment. Reinforcement learning refers to the behavioral process of learning about the value of choices, based on previous choice outcomes. From an algorithmic point of view, rewards and punishments exist on opposite sides of a single value axis. However, simple distinctions between rewards and punishments and their theoretical expression on a single value axis hide considerable psychological complexities that underlie appetitive and aversive reinforcement learning. A broad set of neural circuits, including the amygdala and frontal-striatal systems, have been implicated in mediating learning from gains and losses. The ventral striatum (VS) and amygdala have been implicated in several aspects of this process. To examine the role of the VS and amygdala in learning from gains and losses, we compared the performance of macaque monkeys with VS lesions, with amygdala lesions, and un-operated controls on a series of reinforcement learning tasks. In these tasks monkeys gained or lost tokens, which were periodically cashed out for juice, as outcomes for choices. We found that monkeys with VS lesions had a deficit in learning to choose between cues that differed in reward magnitude. Monkeys with VS lesions performed as well as controls when choices involved a potential loss. In contrast, we found that monkeys with amygdala lesions performed as well as controls across all conditions. Further analysis revealed that the deficits we found in monkeys with VS lesions resulted from a reduction in motivation, rather than the monkeys’ inability to learn the stimulus-outcome contingency.
  • Thumbnail Image
    Item
    Towards Trust and Transparency in Deep Learning Systems through Behavior Introspection & Online Competency Prediction
    (2021) Allen, Julia Filiberti; Gabriel, Steven A.; Mechanical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Deep neural networks are naturally “black boxes”, offering little insight into how or why they make decisions. These limitations diminish the adoption likelihood of such systems for important tasks and as trusted teammates. We employ introspective techniques to abstract machine activation patterns into human-interpretable strategies and identify relationships between environmental conditions (why), strategies (how), and performance (result) on both a deep reinforcement learning two-dimensional pursuit game application and image-based deep supervised learning obstacle recognition application. Pursuit-evasion games have been studied for decades under perfect information and analytically-derived policies for static environments. We incorporate uncertainty in a target’s position via simulated measurements and demonstrate a novel continuous deep reinforcement learning approach against speed-advantaged targets. The resulting approach was tested under many scenarios and performance exceeded that of a baseline course-aligned strategy. We manually observed separation of learned pursuit behaviors into strategy groups and manually hypothesized environmental conditions that affected performance. These manual observations motivated automation and abstraction of conditions, performance and strategy relationships. Next, we found that deep network activation patterns could be abstracted into human-interpretable strategies for two separate deep learning approaches. We characterized machine commitment by the introduction of a novel measure and revealed significant correlations between machine commitment, strategies, environmental conditions, and task performance. As such, we motivated online exploitation of machine behavior estimation for competency-aware intelligent systems. And finally, we realized online prediction capabilities for conditions, strategies, and performance. Our competency-aware machine learning approach is easily portable to new applications due to its Bayesian nonparametric foundation, wherein all inputs are compactly transformed into the same compact data representation. In particular, image data is transformed into a probability distribution over features extracted from the data. The resulting transformation forms a common representation for comparing two images, possibly from different types of sensors. By uncovering relationships between environmental conditions (why), machine strategies (how), & performance (result) and by giving rise to online estimation of machine competency, we increase transparency and trust in machine learning systems, contributing to the overarching explainable artificial intelligence initiative.