EXPERT-IN-THE-LOOP FOR SEQUENTIAL DECISIONS AND PREDICTIONS

dc.contributor.advisorDaumé III, Halen_US
dc.contributor.authorBrantley, Kianteen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2022-06-20T05:33:44Z
dc.date.available2022-06-20T05:33:44Z
dc.date.issued2021en_US
dc.description.abstractSequential decisions and predictions are common problems in natural language processing, robotics, and video games. Essentially, an agent interacts with an environment to learn how to solve a particular problem. Research in sequential decisions and predictions has increased due in part to the success of reinforcement learning. However, this success has come at the cost of algorithms being very data inefficient, making learning in the real world difficult. Our primary goal is to make these algorithms more data-efficient using an expert in the loop (e.g., imitation learning). Imitation learning is a technique for using an expert in sequential decision and prediction problems. Naive imitation learning has a covariate shift problem (i.e., training distribution differs from test distribution). We propose methods and ideas to address this issue and address other issues that arise in different styles of imitation learning. In particular, we study three broad areas of using an expert in the loop for sequential decisions and predictions. First, we study the most popular category of imitation learning, interactive imitation learning. Although interactive imitation learning addresses issues around the covariate shift problem in naive imitation, it does this with a trade-off. Interactive imitation learning assumesaccess to an online interactive expert, which is unrealistic. Instead, we propose a setting where this assumption is realistic and attempt to reduce the amount of queries needed for interactive imitation learning. We further introduce a new category on imitation learning algorithm called, Reward- Learning Imitation learning. Unlike interactive imitation learning, these algorithms only address the covariate shift using demonstration data instead of querying an online interactive expert. This category of imitation learning algorithms assumes access to an underlying reinforcement learning algorithm, that can optimize a reward function learned from demonstration data. We benchmark all algorithms in this category and relate them to modern structured prediction NLP problems. Beyond reward-learning imitation learning and interactive imitation, some problems cannot be naturally expressed and solved using these two categories of algorithms. For example, learning an algorithm that solves a particular problem and also satisfies safety constraints. We introduce expert-in-the-loop techniques that extend beyond traditional imitation learning paradigms, where an expert provides demonstration features or constraints, instead of state-action pairs.en_US
dc.identifierhttps://doi.org/10.13016/xacc-fxv3
dc.identifier.urihttp://hdl.handle.net/1903/28885
dc.language.isoenen_US
dc.subject.pqcontrolledArtificial intelligenceen_US
dc.subject.pquncontrolledImitation Learningen_US
dc.subject.pquncontrolledMachine Learningen_US
dc.subject.pquncontrolledNatural Language Processingen_US
dc.subject.pquncontrolledReinforcement Learningen_US
dc.titleEXPERT-IN-THE-LOOP FOR SEQUENTIAL DECISIONS AND PREDICTIONSen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Brantley_umd_0117E_22232.pdf
Size:
3.78 MB
Format:
Adobe Portable Document Format