Natural Language Understanding Wiki

Human shows great robustness when it comes to language comprehension. They can somehow understand a sentence even if it is just a bunch of words without grammar, or ignore wrong words or specify words that are so vague that they are no more than placeholders. So far NLP systems have not demonstrated such ability.

From Croce et al. (2010)[1]: "Most of the employed learning algorithms are based on complex sets of syntagmatic features, as deeply investigated in (Johansson and Nugues, 2008b). The resulting recognition is thus highly dependent on the accuracy of the underlying parser, whereas wrong structures returned by the parser usually imply large misclassification errors."

From Pradhan (2006)[2]: "There is a significant drop in the precision and recall numbers for the AQUAINT test set [60-70%] (compared to the precision and recall numbers for the PropBank test set which were 82% and 78% respectively)."

TODO: PropBank --> Brown, etc.

Syntactic parser robustness: Hashemi and Hwa (2016)[3], Foster (2004[4], 2005[5], 2007[6])

POS tagging robustness: Gadde et al. (2011)[7]

TODO: Maity et al. (2016)[8]

Methodological issues[]

Should all NLP systems be tested for robustness?

Gimenez and Marquez (2004)[9] created a SVM-based POS tagger which uses discrete representation and can't handle out-of-vocabulary words. The tagger gets state-of-the-art results on WSJ but should it have been tested on other settings? How representative is WSJ for news text? For human language in general?


Kummerfeld et al. (2012)[10] perform an interesting analysis of out-of-domain parsing (see Section 5.2).


Internet language and learner language[]

Reviews: Einsenstein (2013)[11], Plank (2016)[12]. Important: a quantitative analysis: Baldwin et al. (2013)[13]

POS tagging: Owoputi et al. (2013)[14], Ma et al. (2014)[15], Khan et al. (2013)[16]

Relation extraction: Augenstein (2016)[17]

analysis of learner language (using Czech): Rosen (2016)[18]

Comments and discussions: Foster et al. (2011)[19]

Short messages: SMS, micro-blogs and queries[]

"Workshop on Machine Translation (WMT), recently devoted a shared task to this problem (Callison-Burch et al., 2011) consisting of text messages that were sent during the January 2010 earthquake in Haiti to an emergency response service. Participants were faced with a number of problems ranging from ‘text speak’ to the lack of punctuation (Eidelman et al., 2011)."

"Since most user-generated content documents tend to be rather short, which applies in particular to micro-blogs, it is difficult to interpret them in isolation and it is often beneficial to contextualise them in order to facilitate further analysis. In many cases it is possible to link micro-blog messages to full documents such as news articles (Guo et al., 2013). Alternatively, one can group or cluster different micro-blog messages together according to hidden properties, for example representing demographic characteristics (Bergsma and Van Durme, 2013).

NER: Derczynski and Bontcheva (2014)[20], Derczynski et al. (2015)[21], Espinosa et al. (2016)[22], Fromreide and Søgaard (2014)[23], Onal and Karagoz (2015, for Turkish)[24], Shulz (2014, for Dutch)[25]

Polarity detection: Fersini et al. (2016)[26]

Syntax: Pinter et al. (2016)[27]

Event: Tan (2017)[28]


Automatic speech recognition output and dependency parsing, SRL: Favre et al. (2010)[29], Shrestha et al. (2015)[30]

Entity linking: Benton and Dredze (2015)[31]



Liu et al. (2012)[32], Han et al. (2012)[33], Yang and Eisenstein (2013)[34], Li and Liu (2014)[35]

Ruiz et al. (2014, for Spanish)[36]

Limsopatham and Collier (2016)[37]

Čibej et al. (2016, for Slovene)[38]

From Baldwin and Li (2015)[39]: "In this work we build a taxonomy of normalization edits and present a study of normalization to examine its effect on three different downstream applications (dependency parsing, named entity recognition, and text-to-speech synthesis). The results suggest that how the normalization task should be viewed is highly dependent on the targeted application. The results also show that normalization must be thought of as more than word replacement in order to produce results comparable to those seen on clean text."

Domain adaptation[]



  • TED talk treebank: Neubig et al. (2014)[40]
  • English Web Treebank (LDC),
  • Treebank of Learner English
  • Overview of the 2012 Shared Task on Parsing the Web (Petrov and McDonald, 2012)[43]
  • WikiDisc (French): Ho-dac and Laippala (2017)[44]


German: Bartz et al. (2013)[45], Sidarenka et al. (2013)[46], Laarmann-Quante and Dipper (2016)[47]

Open-source software[]

  • ClearNLP (Choi 2012)[48] -- POS tagging, syntactic parsing and SRL, tested on OntoNotes
    • Now is NLP4J and SRL is gone?

See also[]


  1. Croce, D., Giannone, C., Annesi, P., & Basili, R. (2010). Towards Open-Domain Semantic Role Labeling. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, (July), 237–246. Retrieved from
  2. Pradhan, S. S. (2006). Robust Semantic Role Labeling.
  3. Hashemi, Homa B., and Rebecca Hwa. 2016. "An Evaluation of Parser Robustness for Ungrammatical Sentences." EMNLP 2016.
  4. Foster, Jennifer. "Parsing Ungrammatical Input: an Evaluation Procedure." In LREC. 2004.
  5. Foster, J., 2005. Good reasons for noting bad grammar: Empirical investigations into the parsing of ungrammatical written English. Trinity College.
  6. Foster, Jennifer. "Treebanks gone bad." International Journal on Document Analysis and Recognition 10, no. 3 (2007): 129-145.
  7. Gadde, Phani, L. V. Subramaniam, and Tanveer A. Faruquie. "Adapting a WSJ trained part-of-speech tagger to noisy text: preliminary results." In Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, p. 5. ACM, 2011.
  8. Maity, S., Chaudhary, A., Kumar, S., Mukherjee, A., Sarda, C., Patil, A. and Mondal, A., 2016, February. WASSUP? LOL: Characterizing Out-of-Vocabulary Words in Twitter. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion (pp. 341-344). ACM.
  9. Giménez, J., Màrquez, L., & Marquez, L. (2004). Svmtool: A general pos tagger generator based on support vector machines. Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC’ 04, (December), 43–46.
  10. Kummerfeld, J. K., Hall, D., Curran, J. R., & Klein, D. (2012). Parser Showdown at the Wall Street Corral : An Empirical Investigation of Error Types in Parser Output. In EMNLP 2012 (pp. 1048–1059).
  11. Eisenstein, J., 2013, June. What to do about bad language on the internet. In HLT-NAACL (pp. 359-369).
  12. Plank, Barbara. "What to do about non-standard (or non-canonical) language in NLP." arXiv preprint arXiv:1608.07836 (2016).
  13. Baldwin, Timothy, Paul Cook, Marco Lui, Andrew MacKinlay, and Li Wang. "How noisy social media text, how diffrnt social media sources?." In IJCNLP, pp. 356-364. 2013.
  14. Owoputi, Olutobi, Brendan O'Connor, Chris Dyer, Kevin Gimpel, Nathan Schneider, and Noah A. Smith. "Improved part-of-speech tagging for online conversational text with word clusters." Association for Computational Linguistics, 2013.
  15. Ma, Ji, Yue Zhang, and Jingbo Zhu. "Tagging The Web: Building A Robust Web Tagger with Neural Network." In ACL (1), pp. 144-154. 2014.
  16. Khan, Mohammad, Markus Dickinson, and Sandra Kübler. "Towards Domain Adaptation for Parsing Web Data." In RANLP, pp. 357-364. 2013.
  17. Augenstein, I., 2016. Web Relation Extraction with Distant Supervision (Doctoral dissertation, University of Sheffield).
  18. Rosen, Alexandr. "Modeling non-standard language." GramLex 2016 (2016): 120.
  19. Foster, J., Wagner, J., Roux, J. Le, Nivre, J., Hogan, D., & Genabith, J. Van. (2011). From News to Comment : Resources and Benchmarks for Parsing the Language of Web 2 . 0. In IJCNLP 2011 (pp. 893–901).
  20. Derczynski, Leon, and Kalina Bontcheva. "Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Recognising Person Entities in Tweets." In EACL, pp. 69-73. 2014.
  21. Derczynski, Leon, Isabelle Augenstein, and Kalina Bontcheva. "Usfd: Twitter ner with drift compensation and linked data." arXiv preprint arXiv:1511.03088 (2015).
  22. Espinosa, Kurt Junshean, Riza Batista-Navarro, and Sophia Ananiadou. "Learning to recognise named entities in tweets by exploiting weakly labelled data." WNUT 2016 (2016): 153.
  23. Fromreide, Hege, and Anders Søgaard. "NER in Tweets Using Bagging and a Small Crowdsourced Dataset." In International Conference on Natural Language Processing, pp. 45-51. Springer International Publishing, 2014.
  24. Onal, Kezban Dilek, and Pinar Karagoz. "Named Entity Recognition from Scratch on Social Media." (2015).
  25. Schulz, Sarah. "Named entity recognition for user-generated content." ESSLLI 2014 Student Session (2014): 207.
  26. Fersini, Elisabetta, Enza Messina, and Federico Alberto Pozzi. "Expressive signals in social media languages to improve polarity detection." Information Processing & Management 52, no. 1 (2016): 20-35.
  27. Pinter, Yuval, Roi Reichart, and Idan Szpektor. "Syntactic parsing of web queries with question intent." In Proceedings of NAACL-HLT, pp. 670-680. 2016.
  28. Tan, Luchen. "Tracking Events in Social Media." (2017).
  29. Favre, Benoit, Bernd Bohnet, and Dilek Hakkani-Tür. "Evaluation of semantic role labeling and dependency parsing of automatic speech recognition output." In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 5342-5345. IEEE, 2010.
  30. Shrestha, Niraj, Ivan Vulic, and Marie-Francine Moens. "Semantic role labeling of speech transcripts." In Lecture Notes in Computer Science, vol. 9042, pp. 583-595. Springer, 2015.
  31. Benton, Adrian, and Mark Dredze. "Entity Linking for Spoken Language." In HLT-NAACL, pp. 225-230. 2015.
  32. F. Liu, F. Weng, and X. Jiang. A broad-coverage normalization system for social media language. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, ACL ’12, pages 1035–1044, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics. 
  33. B. Han, P. Cook, and T. Baldwin. Automatically constructing a normalisation dictionary for microblogs. In Proceedings of the 2012 Joint Conference on Empiri- cal Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’12, pages 421–432, Stroudsburg, PA, USA, 2012. As- sociation for Computational Linguistics. 
  34. Y. Yang and J. Eisenstein. A log-linear model for unsupervised text normalization. In EMNLP, pages 61–72, 2013. 
  35. C. Li and Y. Liu. Improving text normalization via unsupervised model and dis- criminative reranking. ACL 2014, page 86, 2014. 
  36. Ruiz, Pablo, Montse Cuadros, and Thierry Etchegoyhen. "Lexical Normalization of Spanish Tweets with Rule-Based Components and Language Models." Procesamiento del Lenguaje Natural (2014): 8.
  37. Limsopatham, Nut, and Nigel Collier. "Normalising medical concepts in social media texts by learning semantic representation." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Vol. 1. 2016.
  38. Čibej, Jaka, Darja Fišer, and Tomaž Erjavec. "Normalisation, tokenisation and sentence segmentation of Slovene tweets." Proceedings of Normalisation and Analysis of Social Media Texts (NormSoMe) (2016): 5-10.
  39. Baldwin, Tyler, and Yunyao Li. "An In-depth Analysis of the Effect of Text Normalization in Social Media." In HLT-NAACL, pp. 420-429. 2015.
  40. Neubig, Graham, Katsuhito Sudoh, Yusuke Oda, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. "The naist-ntt ted talk treebank." In International Workshop on Spoken Language Translation. 2014.
  41. Silveira, Natalia, Timothy Dozat, Marie-Catherine De Marneffe, Samuel R. Bowman, Miriam Connor, John Bauer, and Christopher D. Manning. "A Gold Standard Dependency Corpus for English." In LREC, pp. 2897-2904. 2014.
  42. Daiber, Joachim, and Rob van der Goot. "The denoised web treebank: Evaluating dependency parsing under noisy input conditions." LREC, 2016.
  43. Petrov, Slav, and Ryan McDonald. "Overview of the 2012 shared task on parsing the web." In Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL), vol. 59. 2012.
  44. Ho-Dac, L. M., and Veronika Laippala. "Le corpus WikiDisc: ressource pour la caractérisation des discussions en ligne." (2017): 107-124.
  45. Bartz, T., Beißwenger, M., and Storrer, A. (2013). Optimierung des Stuttgart-Tubingen-Tagset f ¨ ur die linguis- ¨ tische Annotation von Korpora zur internetbasierten Kommunikation: Phanomene, Herausforderungen, Er- ¨ weiterungsvorschlage. ¨ JLCL, 28(1):157–198.
  46. Sidarenka, U., Scheffler, T., and Stede, M. (2013). Rule-based normalization of German Twitter messages. In Proceedings of the GSCL Workshop Verarbeitung und Annotation von Sprachdaten aus Genres internetbasierter Kommunikation.
  47. Laarmann-Quante, Ronja, and Stefanie Dipper. "An Annotation Scheme for the Comparison of Different Genres of Social Media with a Focus on Normalization." In Normalisation and Analysis of Social Media Texts (NormSoMe) Workshop Programme, p. 23. LREC 2016.
  48. Choi, Jinho D. "Optimization of natural language processing components for robustness and scalability." (2012).
  49. Plank, B., Martinez Alonso, H., & Søgaard, A. (2015). Non-canonical language is not harder to annotate than canonical language. Proceedings of the 9th Linguistic Annotation Workshop (LAW IX), 148–151.
  50. Farzindar, A., & Inkpen, D. (2015). Natural language processing for social media. Synthesis Lectures on Human Language Technologies (Vol. 8). Morgan & Claypool Publishers.