Transition-based approach solves a NLP task by a series of transitions, each adds one label or change the system's internal state a little bit. The approach is most popular in dependency parsing but can also be found in other tasks, e.g. multi-word expression recognition and semantic role labeling. It is much related to reinforcement learning, though the connection isn't studied well in the literature.

TODO: Daum (2006)[1]: entity detection, coreference resolution

TODO: history: first article is Yamada & Matsumoto (2003)

Yamada, H., & Matsumoto, Y. (2003). Statistical Dependency Analysis with Support Vector Machines. In Proceedings of IWPT (pp. 195–206).

Syntactic parsing Edit

Constituency parsing Edit

Ratnaparkhi (1999)[2] proposed a maximum entropy model for transition-based constituency parsing.

(Zhang and Clark, 2009[3])

Dependency parsing Edit

One of the most popular transition system for dependency parsing is arc-eager. It consists of a buffer holding tokens to be processed and a stack holding (the head of) tree fragments. A transition moves one token from the buffer to the stack, removes a token from the stack, or creates a dependency. Other systems may have different transitions or additional stack etc.

As of 2016, transition-based dependency parsing holds the state-of-the-art in dependency parsing and create buzz beyond research circles with Google Parsey McParseface's release.

Semantic parsing Edit

Semantic role labeling Edit

Swayamdipta et al. (2016)[4]: joint dependency parsing + SRL.

Deep semantic parsing Edit

Zhang et al. (2016)[5]:

"We conduct experiments on CCG-grounded functor–argument analysis, LFG-grounded grammatical relation analysis, and HPSG-grounded semantic dependency analysis for English and Chinese. Experiments demonstrate that data-driven models with appropriate transition systems can produce high-quality deep dependency analysis, comparable to more complex grammar-driven models. Experiments also indicate the effectiveness of the heterogeneous design of transition systems for parser ensemble, transition combination, as well as tree approximation for statistical disambiguation."

Named-entity recognition Edit

Lample et al. (2016)[6]

Lexical tasks (combined with higher tasks) Edit

Constant and Nivre (2016)[7] create a transition-based system to solve MWE recognition and dependency parsing jointly. TODO

Word segmentation + POS tagging + parsing: Hatori et al. (2012)[8]

References Edit

  1. Daum, H. C. (2006). Practical Structured Learning Techniques for Natural Language Processing. University of Southern California.
  2. A. Ratnaparkhi. 1999. Learning to parse natural language with maximum entropy models. Machine Learning, 34(1):151–175
  3. Yue Zhang and Stephen Clark. 2009. Transition-based parsing of the Chinese Treebank using a global dis- criminative model. In Proceedings of IWPT, Paris, France, October. 
  4. Swayamdipta, S., Ballesteros, M., Dyer, C., & Smith, N. A. (2016). Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs. Retrieved from
  6. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural Architectures for Named Entity Recognition. Arxiv, 1–10.
  7. Constant, M., & Nivre, J. (2016). A Transition-Based System for Joint Lexical and Syntactic Analysis. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 161–171). Association for Computational Linguistics. Retrieved from
  8. Hatori, Jun, et al. "Incremental joint approach to word segmentation, pos tagging, and dependency parsing in chinese." Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 2012.
Community content is available under CC-BY-SA unless otherwise noted.