Natural Language Understanding Wiki

Two main streams:

Transition-based parsing[]

Transition systems[]

  • Arc-standard
  • Arc-eager
  • Arc-hybrid (Kuhlmann et al., 2011)[2]
  • Swap-standard
  • Extended arc-standard (Attardi, 2006)[3]

Feature selection[]

  • Traditional feature templates (TODO)
  • Primitive, embedded features: Chen and Manning (2014)[4]
  • Stack LSTM: Dyer et al. (2015)[5]
  • Bi-directional LSTMs: Kiperwasser and Goldberg (2016)[6]

List-based (non-directional) systems[]

  • Easy-first (Goldberg and Elhadad, 2010)[7]
  • Easy-first with Hierarchical Tree LSTMs (Kiperwasser & Goldberg, 2016)[8]
  • Dependency parsing as head selection, using bi-directional LSTMs (Zhang et al. 2016)[9]

Graph-based systems[]

TODO: Yu and Bohnet (2016) PDF


From Yazdani & Henderson (2015)[10]: "To train a classifier to choose the best actions, previous work has proposed memory-based classifiers (Nivre et al., 2004), SVMs (Nivre et al., 2006), structured perceptron (Huang et al., 2012; Zhang and Clark, 2011), two-layer neural networks (Chen and Manning, 2014)[4], and Incremental Sigmoid Belief Networks (ISBN) (Titov and Henderson, 2007b), amongst other approaches."

Yazdani & Henderson (2015): recurrent neural network.


Papers mainly use WSJ of Penn Treebank (TODO: confirm this). Although there is more in the treebank, only WSJ has been patched with gold NP-bracketing (Vadas & Curran, 2007)[11].


  • Label Attachment Score (LAS): % of tokens for which a system has predicted the correct HEAD and DEPREL
  • Unlabeled A5achment Score (UAS): % of tokens with correct HEAD •
  • Label Accuracy (LA): % of tokens with correct DEPREL

For in-depth discussion: Volokh (2013)[12]

Open problems[]

Ng and Curran (2015)[13]: "Our results showthat noun phrases remain challenging for dependency parsers, both in choosing the correct head, and in determining the internal structure. Punctuation, despite being commonly ignored in parsers and evaluation, causes substantial cascading errors when misattached."

McDonald and Nivre (2007)[14]: "Error propagation is an issue for MaltParser, which typically performs worse on long sentences, long dependency arcs and arcs higher in the graphs". This is true for gready transition-based parsers in general. Beam search can alleviate the problem but doesn't solve it. Global inference (Andor et al., 2016)[15] might help but empirical verification is needed.


  1. Vaswani, A., & Sagae, K. (2016). Efficient Structured Inference for Transition-Based Parsing with Neural Networks and Error States. Transactions of the Association for Computational Linguistics, 4, 183–196.
  2. Marco Kuhlmann, Carlos G´ omez-Rodrıguez, and Giorgio Satta. 2011. Dynamic programming algorithms for transition-based dependency parsers. In Proceed- ings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), pages 673–682.
  3. Attardi, G. (2006). Experiments with a Multilanguage Non-Projective Dependency Parser. In Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X) (pp. 166–170). New York City: Association for Computational Linguistics. Retrieved from
  4. 4.0 4.1 Chen, D., & Manning, C. (2014). A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 740–750). Doha, Qatar: Association for Computational Linguistics.
  5. Dyer, C., Ballesteros, M., Ling, W., Matthews, A., & Smith, N. A. (2015). Transition-Based Dependency Parsing with Stack Long Short-Term Memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 334–343). Retrieved from
  6. Kiperwasser, E., & Goldberg, Y. (2016). Simple and Accurate Dependency Parsing Using Bidirectional {LSTM} Feature Representations. CoRR, abs/1603.0.
  7. Yoav Goldberg and Michael Elhadad. 2010. An effi- cient algorithm for easy-first non-directional dependency parsing. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguis- tics (NAACL HLT), pages 742–750.
  8. Kiperwasser, E., & Goldberg, Y. (2016). Easy-First Dependency Parsing with Hierarchical Tree LSTMs. Arxiv. Retrieved from
  9. Zhang, X., Cheng, J., & Lapata, M. (2016). Dependency Parsing as Head Selection. Retrieved from
  10. Yazdani, Majid, and James Henderson. "Incremental Recurrent Neural Network Dependency Parser with Search-based Discriminative Training." (2015): 142-152.
  11. Vadas, D., & Curran, J. R. (2007). Adding noun phrase structure to the Penn Treebank. 45th Annual Meeting of the Association of Computational Linguistics, (June), 240–247. Retrieved from
  12. Volokh, Alexander. "Performance-oriented dependency parsing." (2013). PDF
  13. Ng, D., & Curran, J. R. (2015). Identifying Cascading Errors using Constraints in Dependency Parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (pp. 1148–1158). Beijing: ACL.
  14. McDonald, R., & Nivre, J. (2007). Characterizing the Errors of Data-Driven Dependency Parsing Models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).
  15. Andor, D., Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., Petrove, S., Collins, M. (2016). Globally Normalized Transition-Based Neural Networks., cs.CL.