QBLink: A Dataset for Sequential Open-Domain Question Answering

Ahmed Elgohary, Chen Zhao, and Jordan Boyd-Graber. 2018. Dataset and baselines for sequential open- domain question answering. In Proceedings of Em- pirical Methods in Natural Language Processing.

DRUM DOI

https://doi.org/10.13016/t92u-mpwn

Abstract

We introduce QBLink, a new dataset of about 18,000 question sequences, each sequence consists of three naturally occurring human-authored questions (totaling around 56,000 unique questions). The sequences themselves are also naturally occurring (i.e., we do not artificially combine individually-authored questions to form sequences), which allows us to focus more on the important connections between questions that should be incorporated to improve the end-to-end question answering accuracy. QBLink is based on the bonus questions of Quiz Bowl tournaments. Unlike previous work that only uses the starter (or tossup) questions, bonus questions are not interruptable (players always hear the complete question) and have greater variability in difficulty. Bonus questions start with a lead-in text, which sets the stage for the rest of the question, followed by a sequence of related questions.

URI (handle)

http://hdl.handle.net/1903/27594

Collections

Computer Science Research Works
UMD Data Collection

Full item page