Efficient Non-deterministic Search in Structured Prediction: A Case Study on Syntactic Parsing
Publication or External Link
Non-determinism occurs naturally in many search-based machine learning and natural language processing (NLP) problems. For example, the goal of parsing is to construct the syntactic tree structure of a sentence given a grammar. Agenda-based parsing is a dynamic programming approach to find the most likely syntactic tree of a sentence according to a probabilistic grammar. A chart is used to maintain all the possible subtrees for different spans in the sentence and an agenda is used to rank all the constituents. The parser chooses only one constituent from the agenda per step. Non-determinism occurs naturally in agenda-based parsing since the new constituent is often built by combining items from a few steps earlier.
Unfortunately, like most other problems in NLP, the size of the search space is huge and exhaustive search is impossible. However, users expect a fast and accurate system. In this dissertation, I focus on the question of
Why, when, and how shall we take advantage of non-determinism?'' and show its efficacy to improve the parser in terms of speed and/or accuracy. Existing approaches like search-based imitation learning or reinforcement learning methods have different limitations when it comes to a large NLP system. The solution proposed in this dissertation is We should train the system non-deterministically and test it deterministically if possible.'' and I also show that ``it is better to learn with oracles than simple heuristics''.
We start by solving a generic Markov Decision Process with a non-deterministic agent. We show its theoretical convergence guarantees and verify its efficiency on maze solving problems. Then we focus on agenda-based parsing. To re-prioritize the parser, we model a decoding problem as a Markov Decision Process with a large state/action space. We discuss the advantages/disadvantages of existing techniques and propose a hybrid reinforcement/apprenticeship learning algorithm to trade off speed and accuracy. We also propose to use a dynamic pruner with features that depend on the run-time status of the chart and agenda and analyze the importance of those features in the pruning classification. Our models show comparable results with respect to start-of-the-art strategies.