QBLink: A Dataset for Sequential Open-Domain Question Answering

No Thumbnail Available

Files

QBLink-train.json (24.75 MB)
No. of downloads: 21
QBLink-dev.json (1.98 MB)
No. of downloads: 18
QBLink-test.json (3.41 MB)
No. of downloads: 10474

Related Publication Link

Date

2018-11-03

Related Publication Citation

Ahmed Elgohary, Chen Zhao, and Jordan Boyd-Graber. 2018. Dataset and baselines for sequential open- domain question answering. In Proceedings of Em- pirical Methods in Natural Language Processing.

Abstract

We introduce QBLink, a new dataset of about 18,000 question sequences, each sequence consists of three naturally occurring human-authored questions (totaling around 56,000 unique questions). The sequences themselves are also naturally occurring (i.e., we do not artificially combine individually-authored questions to form sequences), which allows us to focus more on the important connections between questions that should be incorporated to improve the end-to-end question answering accuracy. QBLink is based on the bonus questions of Quiz Bowl tournaments. Unlike previous work that only uses the starter (or tossup) questions, bonus questions are not interruptable (players always hear the complete question) and have greater variability in difficulty. Bonus questions start with a lead-in text, which sets the stage for the rest of the question, followed by a sequence of related questions.

Notes

Rights