SQuAD2-CR: Semi-supervised annotation for cause and rationales for unanswerability in SQuAD 2.0

Gyeongbok Lee, Seung Won Hwang, Hyunsouk Cho

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Existing machine reading comprehension models are reported to be brittle for adversarially perturbed questions when optimizing only for accuracy, which led to the creation of new reading comprehension benchmarks, such as SQuAD 2.0 which contains such type of questions. However, despite the super-human accuracy of existing models on such datasets, it is still unclear how the model predicts the answerability of the question, potentially due to the absence of a shared annotation for the explanation. To address such absence, we release SQuAD2-CR dataset, which contains annotations on unanswerable questions from the SQuAD 2.0 dataset, to enable an explanatory analysis of the model prediction. Specifically, we annotate (1) explanation on why the most plausible answer span cannot be the answer and (2) which part of the question causes unanswerability. We share intuitions and experimental results that how this dataset can be used to analyze and improve the interpretability of existing reading comprehension model behavior.

Original languageEnglish
Title of host publicationLREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
EditorsNicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
PublisherEuropean Language Resources Association (ELRA)
Pages5425-5432
Number of pages8
ISBN (Electronic)9791095546344
Publication statusPublished - 2020
Event12th International Conference on Language Resources and Evaluation, LREC 2020 - Marseille, France
Duration: 2020 May 112020 May 16

Publication series

NameLREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings

Conference

Conference12th International Conference on Language Resources and Evaluation, LREC 2020
CountryFrance
CityMarseille
Period20/5/1120/5/16

Bibliographical note

Publisher Copyright:
© European Language Resources Association (ELRA), licensed under CC-BY-NC

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Education
  • Library and Information Sciences
  • Linguistics and Language

Fingerprint Dive into the research topics of 'SQuAD2-CR: Semi-supervised annotation for cause and rationales for unanswerability in SQuAD 2.0'. Together they form a unique fingerprint.

Cite this