Theses and Dissertations from UMD
Permanent URI for this communityhttp://hdl.handle.net/1903/2
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
6 results
Search Results
Item Representation Learning for Reinforcement Learning: Modeling Non-Gaussian Transition Probabilities with a Wasserstein Critic(2024) Tse, Ryan; Zhang, Kaiqing; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Reinforcement learning algorithms depend on effective state representations when solving complex, high-dimensional environments. Recent methods learn state representations using auxiliary objectives that aim to capture relationships between states that are behaviorally similar, meaning states that lead to similar future outcomes under optimal policies. These methods learn explicit probabilistic state transition models and compute distributional distances between state transition probabilities as part of their measure of behavioral similarity. This thesis presents a novel extension to several of these methods that directly learns the 1-Wasserstein distance between state transition distributions by exploiting the Kantorovich-Rubenstein duality. This method eliminates parametric assumptions about the state transition probabilities while providing a smoother estimator of distributional distances. Empirical evaluation demonstrates improved sample efficiency over some of the original methods and a modest increase in computational cost per sample. The results establish that relaxing theoretical assumptions about state transition modeling leads to more flexible and robust representation learning while maintaining strong performance characteristics.xItem Towards Robust and Adaptable Real-World Reinforcement Learning(2023) Sun, Yanchao; Huang, Furong; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The past decade has witnessed a rapid development of reinforcement learning (RL) techniques. However, there is still a gap between employing RL in simulators and applying RL models to challenging and diverse real-world systems. On the one hand, existing RL approaches have been shown to be fragile under perturbations in the environment, making it risky to deploy RL models in real-world applications where unexpected noise and interference exist. On the other hand, most RL methods focus on learning a policy in a fixed environment, and need to re-train a policy if the environment gets changed. For real-world environments whose agent specifications and dynamics can be ever-changing, these methods become less practical as they require a large amount of data and computations to adapt to a changed environment. We focus on the above two challenges and introduce multiple solutions to improve the robustness and adaptability of RL methods. For robustness, we propose a series of approaches that define, explore, and mitigate the vulnerability of RL agents from different perspectives and achieve state-of-the-art performance on robustifying RL policies. For adaptability, we present transfer learning and pretraining frameworks to address challenging multi-task learning problems that are important yet rarely studied, contributing to the application of RL techniques to more real-life scenarios.Item Effects of Action-Outcome Agency on Feedback Processing(2016) Tootell, Anne; Bernat, Edward; Psychology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The current study investigated the effect of action-outcome agency, or one’s ability to guide behavior during reinforcement learning, on reward and loss processing in a gambling task. Thirty undergraduates (13 females; M age = 19.57, SD = 2.18) completed two computer gambling tasks, one designed to exhibit high levels of action-outcome agency and one with low, while attached to a 128-channel EEG system. Time-frequency event-related potential (TF- ERP) analysis was conducted on the acquired EEG data. ERP components associated with reward and loss processing were significantly dampened in the low action-outcome agency task relative to the high action-outcome agency task. Interestingly, TF-ERP analysis demonstrated a significant effect of action-outcome agency on gain-loss differences in theta but not delta frequencies, suggesting a more central role of loss processing in guidance of goal-directed behavior. These results challenge components of the well-established predicted response-outcome (PRO) model of reinforcement learning.Item SEQUENTIAL DECISIONS AND PREDICTIONS IN NATURAL LANGUAGE PROCESSING(2016) He, He; Daume III, Hal; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Natural language processing has achieved great success in a wide range of ap- plications, producing both commercial language services and open-source language tools. However, most methods take a static or batch approach, assuming that the model has all information it needs and makes a one-time prediction. In this disser- tation, we study dynamic problems where the input comes in a sequence instead of all at once, and the output must be produced while the input is arriving. In these problems, predictions are often made based only on partial information. We see this dynamic setting in many real-time, interactive applications. These problems usually involve a trade-off between the amount of input received (cost) and the quality of the output prediction (accuracy). Therefore, the evaluation considers both objectives (e.g., plotting a Pareto curve). Our goal is to develop a formal understanding of sequential prediction and decision-making problems in natural language processing and to propose efficient solutions. Toward this end, we present meta-algorithms that take an existent batch model and produce a dynamic model to handle sequential inputs and outputs. Webuild our framework upon theories of Markov Decision Process (MDP), which allows learning to trade off competing objectives in a principled way. The main machine learning techniques we use are from imitation learning and reinforcement learning, and we advance current techniques to tackle problems arising in our settings. We evaluate our algorithm on a variety of applications, including dependency parsing, machine translation, and question answering. We show that our approach achieves a better cost-accuracy trade-off than the batch approach and heuristic-based decision- making approaches. We first propose a general framework for cost-sensitive prediction, where dif- ferent parts of the input come at different costs. We formulate a decision-making process that selects pieces of the input sequentially, and the selection is adaptive to each instance. Our approach is evaluated on both standard classification tasks and a structured prediction task (dependency parsing). We show that it achieves similar prediction quality to methods that use all input, while inducing a much smaller cost. Next, we extend the framework to problems where the input is revealed incremen- tally in a fixed order. We study two applications: simultaneous machine translation and quiz bowl (incremental text classification). We discuss challenges in this set- ting and show that adding domain knowledge eases the decision-making problem. A central theme throughout the chapters is an MDP formulation of a challenging problem with sequential input/output and trade-off decisions, accompanied by a learning algorithm that solves the MDP.Item Industrial Flexibility in Theory and Practice(2009) Reindorp, Matthew; Fu, Michael; Goyal, Manu; Business and Management: Decision & Information Technologies; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)At the heart of any decision problem is some degree of "flexibility" in how to act. Most often, we aim to extract greatest possible value from this inherent flexibility. The three essays compiled here are aligned with this same general aim, but we have an important secondary concern: to highlight the value of flexibility itself in the various situations we study. In the first essay, we consider the timing of an action: when to replace obsolete subsystems within an extensive, complex infrastructure. Such replacement action, known as capital renewal, must balance uncertainty about future profitability against uncertainty about future renewal costs. Treating renewal investments as real options, we derive the unique, closed-form optimal solution to the infinite horizon version of this problem and determine the total present value of an institution's capital renewal options. We investigate the sensitivity of the solution to variations in key problem parameters. The second essay addresses the promising of lead times in a make-to-order environment, complicated by the need to serve multiple customer classes with differing priority levels. We tackle this problem with a "model free" approach: after preparing a discrete-event simulation of a make-to-order production system, we determine a policy for lead time promising through application of a reinforcement learning algorithm. The third essay presents an empirical analysis of new product launches in the automotive industry, showing that manufacturing flexibility is one key indicator of superior productivity during launch. We explore the financial dimensions of the apparent productivity differences and show that the use of flexible manufacturing increases an automobile plant's likelihood of being chosen to host a new product launch.Item RESOURCE AND ENVIRONMENT AWARE SENSOR COMMUNICATIONS: FRAMEWORK, OPTIMIZATION, AND APPLICATIONS(2005-12-02) Pandana, Charles; Liu, K. J. Ray; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Recent advances in low power integrated circuit devices, micro-electro-mechanical system (MEMS) technologies, and communications technologies have made possible the deployment of low-cost, low power sensors that can be integrated to form wireless sensor networks (WSN). These wireless sensor networks have vast important applications, i.e.: from battlefield surveillance system to modern highway and industry monitoring system; from the emergency rescue system to early forest fire detection and the very sophisticated earthquake early detection system. Having the broad range of applications, the sensor network is becoming an integral part of human lives. However, the success of sensor networks deployment depends on the reliability of the network itself. There are many challenging problems to make the deployed network more reliable. These problems include but not limited to extending network lifetime, increasing each sensor node throughput, efficient collection of information, enforcing nodes to collaboratively accomplish certain network tasks, etc. One important aspect in designing the algorithm is that the algorithm should be completely distributed and scalable. This aspect has posed a tremendous challenge in designing optimal algorithm in sensor networks. This thesis addresses various challenging issues encountered in wireless sensor networks. The most important characteristic in sensor networks is to prolong the network lifetime. However, due to the stringent energy requirement, the network requires highly energy efficient resource allocation. This highly energy-efficient resource allocation requires the application of an energy awareness system. In fact, we envision a broader resource and environment aware optimization in the sensor networks. This framework reconfigures the parameters from different communication layers according to its environment and resource. We first investigate the application of online reinforcement learning in solving the modulation and transmit power selection. We analyze the effectiveness of the learning algorithm by comparing the effective good throughput that is successfully delivered per unit energy as a metric. This metric shows how efficient the energy usage in sensor communication is. In many practical sensor scenarios, maximizing the energy efficient in a single sensor node may not be sufficient. Therefore, we continue to work on the routing problem to maximize the number of delivered packet before the network becomes useless. The useless network is characterized by the disintegrated remaining network. We design a class of energy efficient routing algorithms that explicitly takes the connectivity condition of the remaining network in to account. We also present the distributed asynchronous routing implementation based on reinforcement learning algorithm. This work can be viewed as distributed connectivity-aware energy efficient routing. We then explore the advantages obtained by doing cooperative routing for network lifetime maximization. We propose a power allocation in the cooperative routing called the maximum lifetime power allocation. The proposed allocation takes into account the residual energy in the nodes when doing the cooperation. In fact, our criterion lets the nodes with more energy to help more compared to the nodes with less energy. We continue to look at the problem of cooperation enforcement in ad-hoc network. We show that by combining the repeated game and self learning algorithm, a better cooperation point can be obtained. Finally, we demonstrate an example of channel-aware application for multimedia communication. In all case studies, we employ optimization scheme that is equipped with the resource and environment awareness. We hope that the proposed resource and environment aware optimization framework will serve as the first step towards the realization of intelligent sensor communications.