EXPERT-IN-THE-LOOP FOR SEQUENTIAL DECISIONS AND PREDICTIONS

Brantley, Kiante

EXPERT-IN-THE-LOOP FOR SEQUENTIAL DECISIONS AND PREDICTIONS

dc.contributor.advisor	Daumé III, Hal	en_US
dc.contributor.author	Brantley, Kiante	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2022-06-20T05:33:44Z
dc.date.available	2022-06-20T05:33:44Z
dc.date.issued	2021	en_US
dc.description.abstract	Sequential decisions and predictions are common problems in natural language processing, robotics, and video games. Essentially, an agent interacts with an environment to learn how to solve a particular problem. Research in sequential decisions and predictions has increased due in part to the success of reinforcement learning. However, this success has come at the cost of algorithms being very data inefficient, making learning in the real world difficult. Our primary goal is to make these algorithms more data-efficient using an expert in the loop (e.g., imitation learning). Imitation learning is a technique for using an expert in sequential decision and prediction problems. Naive imitation learning has a covariate shift problem (i.e., training distribution differs from test distribution). We propose methods and ideas to address this issue and address other issues that arise in different styles of imitation learning. In particular, we study three broad areas of using an expert in the loop for sequential decisions and predictions. First, we study the most popular category of imitation learning, interactive imitation learning. Although interactive imitation learning addresses issues around the covariate shift problem in naive imitation, it does this with a trade-off. Interactive imitation learning assumesaccess to an online interactive expert, which is unrealistic. Instead, we propose a setting where this assumption is realistic and attempt to reduce the amount of queries needed for interactive imitation learning. We further introduce a new category on imitation learning algorithm called, Reward- Learning Imitation learning. Unlike interactive imitation learning, these algorithms only address the covariate shift using demonstration data instead of querying an online interactive expert. This category of imitation learning algorithms assumes access to an underlying reinforcement learning algorithm, that can optimize a reward function learned from demonstration data. We benchmark all algorithms in this category and relate them to modern structured prediction NLP problems. Beyond reward-learning imitation learning and interactive imitation, some problems cannot be naturally expressed and solved using these two categories of algorithms. For example, learning an algorithm that solves a particular problem and also satisfies safety constraints. We introduce expert-in-the-loop techniques that extend beyond traditional imitation learning paradigms, where an expert provides demonstration features or constraints, instead of state-action pairs.	en_US
dc.identifier	https://doi.org/10.13016/xacc-fxv3
dc.identifier.uri	http://hdl.handle.net/1903/28885
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Artificial intelligence	en_US
dc.subject.pquncontrolled	Imitation Learning	en_US
dc.subject.pquncontrolled	Machine Learning	en_US
dc.subject.pquncontrolled	Natural Language Processing	en_US
dc.subject.pquncontrolled	Reinforcement Learning	en_US
dc.title	EXPERT-IN-THE-LOOP FOR SEQUENTIAL DECISIONS AND PREDICTIONS	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Brantley_umd_0117E_22232.pdf
Size:: 3.78 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations