TOWARDS PRACTICAL COMPLEX QUESTION ANSWERING
Files
Publication or External Link
Date
Authors
Citation
DRUM DOI
Abstract
Question answering (QA) is one of the most important and challenging tasks for understanding human language. With the help of large-scale benchmarks, there is tremendous success on building neural QA systems, and such progress has been deployed into commercial systems like search engines. However, most QA systems target rather simple questions that can be answered within a single evidence piece (e.g., a sentence). In many real scenarios, users also ask complex questions that require multiple evidence pieces, and search engines fail to answer them. The goal of this dissertation work is to tackle complex QA problem from different angles.We first study complex QA using text collections as a knowledge source. We build two QA systems that rely on a free-text knowledge graph from Wikipedia. Through extracting a question grounded sub-graph and using graph neural network to reason over this graph, the proposed QA systems are state-of-the-art on multiple complex QA benchmarks. Then we present two solutions to address some key assumptions that make state-of-the-art QA systems difficult to generalize beyond specific benchmarks. We first address the assumption that the given text collections is semi-structured by hyperlinks. We propose a multi-step dense retrieval method to model the implicit relationships between evidence pieces. The retriever is competitive to state-of-the-arts on complex QA benchmarks, without using any semi-structured information. To further address the assumption that annotated evidence labels are given during training, we focus on the weakly-supervised setting, with only question-answer pairs available. We propose an iterative approach that improves over a weak retriever by alternately finding evidence from the up-to-date model, and encouraging the model to learn the most likely evidence. Without using any evidence labels, our approach is on par with fully-supervised counterparts. We also study complex QA using tables as a knowledge source. We focus on a practical problem that is dismissed by benchmarks: domain generalization on mathematical operation over columns. We first construct benchmarks to quantify this problem, then we address this problem by incorporating the necessary domain knowledge through table schema preprocessing. Our approach significantly outperforms baselines on this problem, and as a result, boosts the overall performance.