Dependency parsing experiments

Data[]

Notes about "standard split" in "Linguistic Structure Prediction", Noah A. Smith.

The "standard split" is sections 2-21 for training, 22 for development and 23 for testing, all in WSJ part of PENN Treebank.

It is used in, for example, Chen & Manning (2014)^[1].

Preprocessing: POS tagging[]

Although PENN treebank comes with hand-annotated POS tags, some researchers replace them with automatic counterparts using a procedure called "10-way jackknifing". Koo et al. (2008)^[2] explains:

"That is, we tagged each fold with the tagger trained on the other 9 folds."

Che et al. (2012)^[3] states:

Training sentences in each fold were tagged using a model based on the other nine folds; development and test sentences were tagged using a model based on all ten of the training folds.
[...]
Since POS-tags are especially informative of Chinese dependencies (Li et al., 2011), we harmonized training and test data, using 10-way jackknifing (see §2.4). This method is more robust than training a parser with gold tags because it improves consistency, particularly for Chinese, where tagging accuracies are lower than in English. On development data, Mate scored worse given gold tags (75.4 versus 78.2%). [Berkeley’s performance suffered with jackknifed tags (76.5 versus 77.0%), possibly because it parses and tags better jointly]

Screen Shot 2015-12-13 at 13.31 — Excerpt from Chen et al. (2015)

Chen et al. (2015)^[4] also found 10-way jackknifing beneficial for dependency parsing (see screenshot on the right).

This procedure is widely adopted, for example in Zhang & Nivre (2011)^[5], Zhang et al. (2014)^[6], Chen & Manning (2014)^[1], Pei et al. (2015)^[7]

Preprocessing: Constituent-to-dependency conversion[]

Simplistic head-finding rule[]

Used in Collins (1999)^[8], Yamada & Matsumoto (2003)^[9], Zhang & Clark (2008)^[10], etc. This procedure is summarized in Johansson & Nugues (2007)^[11]:

... based on the idea of assigning each constituent in the parse tree a unique head selected amongst the constituent’s children (Magerman, 1994). For example, the toy grammar below would select the noun as the head of an NP, the verb as the head of a VP, and VP as the head of an S consisting of a noun phrase and a verb phrase:
NP --> DT NN*
VP --> VBD* NP
S --> NP VP*
By following the child-parent links from the token
level up to the root of the tree, we can label every constituent with a head token. The heads can then be used to create dependency trees: to determine the parent of a token in the dependency tree, we locate the highest constituent that it is the head of and select the head of its parent constituent.

Extended head-finding rule[]

Proposed in Johansson & Nugues (2007)^[11], the procedure "makes better use of the existing information in the Treebank". It introduces more labels, denotes more linguistic phenomenon, among other things. The authors reported a drop in dependency parsing accuracy but improvement in semantic role labeling with dependency as input.

This conversion was used in: TODO

References[]

↑ ^1.0 ^1.1 Chen, D., & Manning, C. (2014). A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 740–750). Doha, Qatar: Association for Computational Linguistics.
↑ Koo, T., Carreras Pérez, X., & Collins, M. (2008). Simple semi-supervised dependency parsing.
↑ Che, W., Spitkovsky, V. I., & Liu, T. (2012, July). A comparison of chinese parsers for stanford dependencies. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2 (pp. 11-16). Association for Computational Linguistics.
↑ Chen, X. Y., Zhao, Y. P., Shang, J. Y., & Shi, X. W. (2015, June). A comparison of part-of-speech tag sets for transition-based dependency parsing. In Information, Computer and Application Engineering: Proceedings of the International Conference on Information Technology and Computer Application Engineering (ITCAE 2014), Hong Kong, China, 10-11 December 2014 (p. 269). CRC Press.
↑ Zhang, Y., & Nivre, J. (2011). Transition-based dependency parsing with rich non-local features. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 188–193).
↑ Li, Z., Zhang, M., Che, W., Liu, T., & Chen, W. (2014). Joint Optimization for Chinese POS Tagging and Dependency Parsing. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 22(1), 274-286.
↑ Pei, W., Ge, T., & Chang, B. (2015). An Effective Neural Network Model for Graph-based Dependency Parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 313–322). Association for Computational Linguistics.
↑ Michael John Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania.
↑ Yamada, Hiroyasu, and Yuji Matsumoto. "Statistical dependency analysis with support vector machines." In Proceedings of IWPT, vol. 3, pp. 195-206. 2003.
↑ Zhang, Y., & Clark, S. (2008). A Tale of Two Parsers : investigating and combining graph-based and transition-based dependency parsing using beam-search, (October), 562–571.
↑ ^11.0 ^11.1 Johansson, R., & Nugues, P. (2007). Extended Constituent-to-dependency Conversion for English. In Proceedings of NODALIDA 2007 (pp. 105–112). Tartu, Estonia.

[chen.manning2014-1] 1.0 ^1.1 Chen, D., & Manning, C. (2014). A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 740–750). Doha, Qatar: Association for Computational Linguistics.

[2] Koo, T., Carreras Pérez, X., & Collins, M. (2008). Simple semi-supervised dependency parsing.

[3] Che, W., Spitkovsky, V. I., & Liu, T. (2012, July). A comparison of chinese parsers for stanford dependencies. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2 (pp. 11-16). Association for Computational Linguistics.

[4] Chen, X. Y., Zhao, Y. P., Shang, J. Y., & Shi, X. W. (2015, June). A comparison of part-of-speech tag sets for transition-based dependency parsing. In Information, Computer and Application Engineering: Proceedings of the International Conference on Information Technology and Computer Application Engineering (ITCAE 2014), Hong Kong, China, 10-11 December 2014 (p. 269). CRC Press.

[5] Zhang, Y., & Nivre, J. (2011). Transition-based dependency parsing with rich non-local features. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 188–193).

[6] Li, Z., Zhang, M., Che, W., Liu, T., & Chen, W. (2014). Joint Optimization for Chinese POS Tagging and Dependency Parsing. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 22(1), 274-286.

[7] Pei, W., Ge, T., & Chang, B. (2015). An Effective Neural Network Model for Graph-based Dependency Parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 313–322). Association for Computational Linguistics.

[8] Michael John Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania.

[9] Yamada, Hiroyasu, and Yuji Matsumoto. "Statistical dependency analysis with support vector machines." In Proceedings of IWPT, vol. 3, pp. 195-206. 2003.

[10] Zhang, Y., & Clark, S. (2008). A Tale of Two Parsers : investigating and combining graph-based and transition-based dependency parsing using beam-search, (October), 562–571.

[johansson.nugues07-11] 11.0 ^11.1 Johansson, R., & Nugues, P. (2007). Extended Constituent-to-dependency Conversion for English. In Proceedings of NODALIDA 2007 (pp. 105–112). Tartu, Estonia.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]