Acting, Planning, and Learning Using Hierarchical Operational Models

Patra, Sunandita

Acting, Planning, and Learning Using Hierarchical Operational Models

Files

Patra_umd_0117E_21050.pdf (5.48 MB)

No. of downloads: 237

Date

2020

Authors

Patra, Sunandita

Advisor

Nau, Dana

DRUM DOI

https://doi.org/10.13016/dhn4-gxki

Abstract

The most common representation formalisms for planning are descriptive models that abstractly describe what the actions do and are tailored for efficiently computing the next state(s) in a state-transition system. However, real-world acting requires operational models that describe how to do things, with rich control structures for closed-loop online decision-making in a dynamic environment. Use of a different action model for planning than the one used for acting causes problems with combining acting and planning, in particular for the development and consistency verification of the different models.

As an alternative, this dissertation defines and implements an integrated acting-and-planning system in which both planning and acting use the same operational models, which are written in a general-purpose hierarchical task-oriented language offering rich control structures.

The acting component, called Reactive Acting Engine (RAE), is inspired by the well-known PRS system, except that instead of being purely reactive, it can get advice from a planner. The dissertation also describes three planning algorithms which plan by doing several Monte Carlo rollouts in the space of operational models. The best of these three planners, Plan-with-UPOM uses a UCT-like Monte Carlo Tree Search procedure called UPOM (UCT Procedure for Operational Models), whose rollouts are simulated executions of the actor's operational models. The dissertation also presents learning strategies for use with RAE and UPOM that acquire from online acting experiences and/or simulated planning results, a mapping from decision contexts to method instances as well as a heuristic function to guide UPOM. The experimental results show that Plan-with-UPOM and the learning strategies significantly improve the acting efficiency and robustness of RAE. It can be proved that UPOM converges asymptotically by mapping its search space to an MDP. The dissertation also describes a real-world prototype of RAE and Plan-with-UPOM to defend software-defined networks, a relatively new network management architecture, against incoming attacks.

URI (handle)

http://hdl.handle.net/1903/26448

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations

Full item page