Changes: Reproducing Chen & Manning (2014)

Revision as of 14:25, 9 January 2016

This page documents necessary steps to reproduce results of Chen & Manning (2014)^[1] for English (including re-implementation) and makes explicit decisions that aren't covered in the paper.

Obtain data: WSJ part of PENN Treebank. Section 02-21 for training, 22 for development, 23 for testing.
- ~~I used this revised version: LDC2015T13~~ LTH converter doesn't work with this version.
- I used PENN Treebank 3.
Assign POS tags using Stanford POS tagger with ten-way jackknifing of the training data
- Reported accuracy: ≈ 97.3%
- I used version 3.6.0 downloaded here and followed instructions in the JavaDoc.
- Reused english-bidirectional-distsim.tagger.props. Downloaded word clusters. Fixed a crash. (Which used bidirectional5words model.)
- Instructions say: "The part-of-speech tags used as input for training and testing were generated by the Stanford POS Tagger (using the bidirectional5words model)."
- It's not clear how to divide the folds which can make a difference. I divide it by sentences, the accuracy is 97.18%. I also tried to divide by documents and it wasn't better.
Constituent-to-dependency conversion:
1. LTH Constituent-to-Dependency Conversion Tool
  - Downloaded pennconverter
  - The paper didn't specify command-line options or reference type of conversion
  - Head-finding rules matter
  - The default is CoNLL-2008 conventions and CoNLL-X file format
  - I tried -oldLTH and -conll2007 but it doesn't split tokens with slases (different from footnote 6 page 745)
  - Tried -rightBranching=false and the performance of MaltParser was low: around 80% instead of 90%.
  - Command: java -jar pennconverter.jar
  - Error in one sentence, skipped. I submitted a question on Stackoverflow.
2. Stanford Basic Dependencies
  - Use Stanford parser v3.3.0 (page 745), downloaded here under the name stanford-parser-full-2013-11-12.
  - Convert PENN Treebank to Stanford Basic Dependency using: java -cp stanford-parser-full-2014-10-31/stanford-parser.jar edu.stanford.nlp.trees.EnglishGrammaticalStructure -basic -conllx -originalDependencies -treeFile xxx
Measure statistics: sentences, words, POS's, labels, projective percentage (Table 3)
- TODO
Evaluation tool:
- Downloaded MaltEval
- Should I use CoNLL-X eval script instead?? What is the difference between them?
- Stanford also provides evaluation tool: "The package includes a tool for scoring of generic dependency parses, in a class edu.stanford.nlp.trees.DependencyScoring. This tool measures scores for dependency trees, doing F1 and labeled attachment scoring. The included usage message gives a detailed description of how to use the tool."
- Counter-intuitive observation: counting punctuation actually decrease UAS and LAS by ~3% --> the parser mistakes punctuations more often than average tokens.
Run Stanford neural parser on the data and measure results.
- Download stanford-parser-full-2014-10-31.zip as instructed here
Run off-the-shelf MaltParser and MSTParser on dev and test sets.
Implement oracle
Implement parser
- Minuscule detail in the implementation: right child is to the right of the node of interest and left child is to the left.
Implement neural net
- Dropout: ~~it isn't clear where did they apply dropout: to the output of embedding layer or hidden layer?~~ applied to hidden layer units.
- The paper implies that learning rate was varied during training ("initial learning rate of Adagrad α = 0.01.") but doesn't reveal the method (annealing/linear/etc.) and how much.
- The paper says "A slight variation is that we compute the softmax probabilities only among the feasible transitions in practice." but the implementation actually compute all probabilities.
- Note from the source code: output layer doesn't have bias terms which is consistence with the paper -- there is no bias in feature templates.

References

↑ Chen, D., & Manning, C. (2014). A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 740–750). Doha, Qatar: Association for Computational Linguistics.

[1] Chen, D., & Manning, C. (2014). A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 740–750). Doha, Qatar: Association for Computational Linguistics.

[1]

@@ Line 29: / Line 29: @@
 #* Should I use [http://ilk.uvt.nl/conll/software.html CoNLL-X eval script] instead?? What is the difference between them?
 #* Stanford also provides [http://nlp.stanford.edu/software/lex-parser.shtml evaluation tool]: "The package includes a tool for scoring of generic dependency parses, in a class edu.stanford.nlp.trees.DependencyScoring. This tool measures scores for dependency trees, doing F1 and labeled attachment scoring. The included usage message gives a detailed description of how to use the tool."
+#* Counter-intuitive observation: counting punctuation actually ''decrease'' UAS and LAS by ''~3%'' --> the parser mistakes punctuations ''more often'' than average tokens.
 # Run [http://nlp.stanford.edu/software/nndep.shtml Stanford neural parser] on the data and measure results.
 #* Download <code>stanford-parser-full-2014-10-31.zip</code> as instructed [http://nlp.stanford.edu/software/nndep.shtml here]