Computer Science Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2756

Browse

Search Results

Now showing 1 - 7 of 7
  • Thumbnail Image
    Item
    Learning-based Motion Planning for High-DoF Robot Systems
    (2023) Jia, Biao; Manocha, Dinesh; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    A high-degree-of-freedom (DoF) robot system refers to a type of robotic system that possesses many independently controllable mechanical degrees of freedom. This includes high-DoF robots or objects being manipulated, such as flexible robotic arms and flexible objects. Degrees of freedom in robotics represent the different ways a robot can move or manipulate its parts. High-DoF robot systems have a significant number of these independent motions, allowing them to exhibit complex and versatile movements and behaviors. These systems are employed in various applications, including manufacturing and healthcare, where precise and flexible control is essential. The main difficulty associated with high-DoF robot systems is the complexity arising from their numerous degrees of freedom. Calculating the optimal trajectories or control inputs for high-DoF systems can be computationally intensive. The sheer number of variables and the need for real-time responsiveness pose significant challenges in terms of computation and control. In some cases, high-DoF robot systems interact with deformable objects such as fabrics and foam. Modeling and controlling these objects add additional layers of complexity due to their dynamic and unpredictable behavior. To address these challenges, we delve into several key areas: Object Deformation Modeling, Controller Parameterization, System Identification, Control Policy Learning, and Sim-to- Real Transfer. We begin by using cloth manipulation as an example to illustrate how to model high-DoF objects and design mapping relationships. By leveraging computer vision and visual feedback-based controllers, we enhance the ability to model and control objects with substantial shape variations, which is particularly valuable in applications involving deformable materials. Next, we shift our focus to Controller Parameterization, aiming to define control parameters for high-DoF objects. We employ a random forest-based controller along with imitation learning, resulting in more robust and efficient controllers, which are essential for high-DoF robot systems. This method can be used for human-robot collaboration involving flexible objects and enables imitation learning to converge in as few as 4-5 iterations. Furthermore, we explore how to reduce the dimensionality of both high-degree-of-freedom (high-DoF) robot systems and objects simultaneously. Our system allows for the more effective use of computationally intensive methods like reinforcement learning (RL) or trajectory optimization. Therefore, we design a system identification method to reduce the need for repeated rendering or experiments, significantly improving the efficiency of RL. This enables some algorithms with exponential computational complexity to be solved in linear time. In this part of the work, we adopt a real setup where humans and robots collaborate in real-time to manipulate flexible objects. In the second part of our research, we focus on the task of natural media painting. We utilize reinforcement learning techniques. Painting itself can be considered a high-DoF robot system, as it entails a multitude of context-dependent actions to complete the task. Our objective is to replicate a reference image using brush strokes, with the goal encoded through observations. We will focus on how to address the sparse reward distribution with a large continuous action space. Additionally, we investigate the practicality of transferring learned policies from simulated environments to real-world scenarios, with a specific focus on tasks like painting. This research bridges the gap between simulation and practical application, ensuring that the knowledge gained from our work can be effectively utilized in real-world settings. Ultimately, we will demonstrate the use of RL-learned painting strategies in both virtual and real robot environments.
  • Thumbnail Image
    Item
    Scalable Methods for Robust Machine Learning
    (2023) Levine, Alexander Jacob; Feizi, Soheil; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In recent years, machine learning systems have been developed that demonstrate remarkable performance on many tasks. However, naive metrics of performance, such as the accuracy of a classifier on test samples drawn from the same distribution as the training set, can provide an overly optimistic view of the suitability of a model for real-world deployment. In this dissertation, we develop models that are robust, in addition to performing well on large-scale tasks. One notion of robustness is adversarial robustness, which characterizes the performance of models under adversarial attacks. Adversarial attacks are small, often imperceptible, distortions to the inputs of machine learning systems which are crafted to substantially change the output of the system. These attacks represent a real security threat, and are especially concerning when machine learning systems are used in safety-critical applications. To mitigate this threat, certifiably robust classification techniques have been developed. In a certifiably robust classifier, for each input sample, in addition to a classification, the classifier also produces a certificate, which is a guaranteed lower bound on the magnitude of any perturbation required to change the classification. Existing methods for certifiable robustness have significant limitations, which we address in Parts I and II of this dissertation: (i) Currently, randomized smoothing techniques are the only certification techniques that are viable for large-scale image classification (i.e. ImageNet). However, randomized smoothing techniques generally provide only high-probability, rather than exact, certificate results. To address this, we develop deterministic randomized smoothing-based algorithms, which produce exact certificates with finite computational costs. In particular, in Part I of this dissertation, we present to our knowledge the first deterministic, ImageNet-scale certification methods under the L_1, L_p (for p < 1), and "L_0" metrics. (ii) Certification results only apply to particular metrics of perturbation size. There is therefore a need to develop new techniques to provide provable robustness against different types of attacks. In Part II of this dissertation, we develop randomized smoothing-based algorithms for several new types of adversarial perturbation, including Wasserstein adversarial attacks, Patch adversarial attacks, and Data Poisoning attacks. The methods developed for Patch and Poisoning attacks are also deterministic, allowing for efficient exact certification. In Part III of this dissertation, we consider a different notion of robustness: test-time adaptability to new objectives in reinforcement learning. This is formalized as goal-conditioned reinforcement learning (GCRL), in which each episode is conditioned by a new "goal," which determines the episode's reward function. In this work, we explore a connection between off-policy GCRL and knowledge distillation, which leads us to apply Gradient-Based Attention Transfer, a knowledge distillation technique, to the Q-function update. We show, empirically and theoretically, that this can improve the performance of off-policy GCRL when the space of goals is high-dimensional.
  • Thumbnail Image
    Item
    Multi-Agent Reinforcement Learning: Systems for Evaluation and Applications to Complex Systems
    (2023) Terry, Jordan; Dickerson, John; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Reinforcement learning is a field of artificial intelligence that studies methods for agents to learn by trial and error to take actions in a given system. Famous examples of it have included learning to control real robots, or achieving superhuman performance in most of the most popular and challenging games for humans. In order to conduct research in this space, researchers use standardized "environments", such as robotics simulations or video games, to evaluate the performance of learning methods. This thesis covers PettingZoo, a library that offers a standardized API and set of reference environments for multi-agent reinforcement learning that's become widely used, SuperSuit, a library that offers a easy-to-use standardized preprocessing wrappers for interfacing with learning libraries, and extensions to the Arcade Learning Environment (a popular tool which reinforcement learning researchers use to interact with Atari 2600 games) that allows for supporting multiplayer game modes. Using these tools, this thesis also uses multi-agent reinforcement learning to develop a new tool for natural science research. Emergent behaviors refer to the coordinated behaviors of groups of agents such as pedestrians in a crosswalk, birds in flocking formations, cars in traffic or traders in the stock market, and represent some of the most important things that we generally don't understand across many fields of science. In this work, we introduce the first mathematical formalism for the systematic search of all possible good ("mature") emergent behaviors within a multi-agent system through multi-agent reinforcement learning (MARL), and create a naive implementation of this search via deep reinforcement learning that can be applied in arbitrary environments. We show that in 12 multi-agent systems, this naive method is able to find over a hundred total emergent behaviors, the majority of which were previously unknown to the environment authors. Such methods could allow for answering various types of open scientific questions, such as "What behaviors are possible in this system", "What specific conditions in this system allow for this kind of emergent behavior", or "How can I change this system to prevent this emergent behavior."
  • Thumbnail Image
    Item
    Efficient Environment Sensing and Learning for Mobile Robots
    (2022) Suryan, Varun; Tokekar, Pratap; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Data-driven learning is becoming an integral part of many robotic systems. Robots can be used as mobile sensors to learn about the environment in which they operate. Robots can also seek to learn essential skills, such as navigation, within the environment. A critical challenge in both types of learning is sample efficiency. Acquiring samples with physical robots can be prohibitively time-consuming. As a result, when applying learning techniques in robotics that require physical interaction with the environment, minimizing the number of such interactions becomes a key. The key question we seek to answer is: How do we make robots learn efficiently with a minimal amount of physical interaction? We approach this question along two fronts: extrinsic learning and intrinsic learning. In extrinsic learning, we want the robot to learn about the external environment in which it is operating. In intrinsic learning, our focus is on the robot to learn a skill using reinforcement learning (RL) such as navigating in an environment. In this dissertation, we develop algorithms that carefully plan where the robots obtain samples in order to efficiently perform intrinsic and extrinsic learning. In particular, we exploit the structural properties of Gaussian Process (GP) regression to design efficient sampling algorithms. We study two types of problems under extrinsic learning. We start with the problem of learning a spatially varying field modeled by a GP efficiently. Our goal is to ensure that the GP posterior variance, which is also the mean square error between the learned and actual fields, is below a predefined value. By exploiting the underlying properties of GP, we present a series of constant-factor approximation algorithms for minimizing the number of stationary sensors to place, minimizing the total time taken by a single robot, and minimizing the time taken by a team of robots to learn the field. Here, we assume that the GP hyperparameters are known. We then study a variant where our goal is to identify the hotspot in an environment. Here we do not assume that hyperparameters are unknown. For this problem, we present Upper Confidence Bound (UCB) and Monte Carlo Tree Search (MCTS) based algorithms for a single robot and later extend them to decentralized multi-robot teams. We also validate their performance on real-world datasets. For intrinsic learning, our aim is to reduce the number of physical interactions by leveraging simulations often known as Multi-Fidelity Reinforcement Learning (MFRL). In the MFRL framework, an agent uses multiple simulators of the real environment to perform actions. We present two MFRL framework versions, model-based and model-free, that leverage GPs to learn the optimal policy in a real-world environment. By incorporating GPs in the MFRL framework, we empirically observe a significant reduction in the number of samples for model-based and model-free learning.
  • Thumbnail Image
    Item
    EXPERT-IN-THE-LOOP FOR SEQUENTIAL DECISIONS AND PREDICTIONS
    (2021) Brantley, Kiante; Daumé III, Hal; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Sequential decisions and predictions are common problems in natural language processing, robotics, and video games. Essentially, an agent interacts with an environment to learn how to solve a particular problem. Research in sequential decisions and predictions has increased due in part to the success of reinforcement learning. However, this success has come at the cost of algorithms being very data inefficient, making learning in the real world difficult. Our primary goal is to make these algorithms more data-efficient using an expert in the loop (e.g., imitation learning). Imitation learning is a technique for using an expert in sequential decision and prediction problems. Naive imitation learning has a covariate shift problem (i.e., training distribution differs from test distribution). We propose methods and ideas to address this issue and address other issues that arise in different styles of imitation learning. In particular, we study three broad areas of using an expert in the loop for sequential decisions and predictions. First, we study the most popular category of imitation learning, interactive imitation learning. Although interactive imitation learning addresses issues around the covariate shift problem in naive imitation, it does this with a trade-off. Interactive imitation learning assumesaccess to an online interactive expert, which is unrealistic. Instead, we propose a setting where this assumption is realistic and attempt to reduce the amount of queries needed for interactive imitation learning. We further introduce a new category on imitation learning algorithm called, Reward- Learning Imitation learning. Unlike interactive imitation learning, these algorithms only address the covariate shift using demonstration data instead of querying an online interactive expert. This category of imitation learning algorithms assumes access to an underlying reinforcement learning algorithm, that can optimize a reward function learned from demonstration data. We benchmark all algorithms in this category and relate them to modern structured prediction NLP problems. Beyond reward-learning imitation learning and interactive imitation, some problems cannot be naturally expressed and solved using these two categories of algorithms. For example, learning an algorithm that solves a particular problem and also satisfies safety constraints. We introduce expert-in-the-loop techniques that extend beyond traditional imitation learning paradigms, where an expert provides demonstration features or constraints, instead of state-action pairs.
  • Thumbnail Image
    Item
    EFFECTIVENESS OF PROXIMAL POLICY OPTIMIZATION METHODS FOR NEURAL PROGRAM INDUCTION
    (2020) Lin, Runxing; Reggia, James Dr.; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The Neural Virtual Machine (NVM) is a novel neurocomputational architecturedesigned to emulate the functionality of a traditional computer. A version of the NVM called NVM-RL supports reinforcement learning based on standard policy gradient methods as a mechanism for performing neural program induction. In this thesis, I modified NVM-RL using one of the most popular reinforcement learning algorithms, proximal policy optimization (PPO). Surprisingly, using PPO with the existing all-or-nothing reward function did not improve its effectiveness. However, I found that PPO did improve the performance of the existing NVM-RL if one instead used a reward function that grants partial credit for incorrect outputs based on how much those incorrect outputs differ from the correct targets. I conclude that, in some situations, PPO can improve the performance of reinforcement learning during program induction, but that this improvement is dependent on the quality of the reward function that is used.
  • Thumbnail Image
    Item
    Efficient Non-deterministic Search in Structured Prediction: A Case Study on Syntactic Parsing
    (2014) Jiang, Jiarong; Daume, Hal; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Non-determinism occurs naturally in many search-based machine learning and natural language processing (NLP) problems. For example, the goal of parsing is to construct the syntactic tree structure of a sentence given a grammar. Agenda-based parsing is a dynamic programming approach to find the most likely syntactic tree of a sentence according to a probabilistic grammar. A chart is used to maintain all the possible subtrees for different spans in the sentence and an agenda is used to rank all the constituents. The parser chooses only one constituent from the agenda per step. Non-determinism occurs naturally in agenda-based parsing since the new constituent is often built by combining items from a few steps earlier. Unfortunately, like most other problems in NLP, the size of the search space is huge and exhaustive search is impossible. However, users expect a fast and accurate system. In this dissertation, I focus on the question of ``Why, when, and how shall we take advantage of non-determinism?'' and show its efficacy to improve the parser in terms of speed and/or accuracy. Existing approaches like search-based imitation learning or reinforcement learning methods have different limitations when it comes to a large NLP system. The solution proposed in this dissertation is ``We should train the system non-deterministically and test it deterministically if possible.'' and I also show that ``it is better to learn with oracles than simple heuristics''. We start by solving a generic Markov Decision Process with a non-deterministic agent. We show its theoretical convergence guarantees and verify its efficiency on maze solving problems. Then we focus on agenda-based parsing. To re-prioritize the parser, we model a decoding problem as a Markov Decision Process with a large state/action space. We discuss the advantages/disadvantages of existing techniques and propose a hybrid reinforcement/apprenticeship learning algorithm to trade off speed and accuracy. We also propose to use a dynamic pruner with features that depend on the run-time status of the chart and agenda and analyze the importance of those features in the pruning classification. Our models show comparable results with respect to start-of-the-art strategies.