Natural Language Understanding Wiki

SQuAD is a line of question-answering datasets created by Stanford. The first incarnation is published in Rajpurkar et al. (2016)[1] and quickly became popular. However, results on this dataset quickly surpass human performance with the application of what Percy Liang has called "cheap tricks". Adversarial SQuAD (Jia and Liang, 2017[2]) and SQuAD 2.0 (Rajpurkar et al. 2018[3]) are created to evaluate for higher inference skills.

Open-source software packages[]


  1. Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). SQuAD: 100,000+ Questions for Machine Comprehension of Text. In EMNLP 2016 (pp. 2383–2392).
  2. Jia, R., & Liang, P. (2017). Adversarial Examples for Evaluating Reading Comprehension Systems. EMNLP 2017, 2021–2031. Retrieved from
  3. Rajpurkar, P., Jia, R., & Liang, P. (2018). Know What You Don’t Know: Unanswerable Questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 784–789). Association for Computational Linguistics.