(no bias for output) Tag: sourceedit |
(comment on no-bias) Tag: sourceedit |
||
Line 37: | Line 37: | ||
#* The paper implies that learning rate was varied during training ("initial learning rate of Adagrad α = 0.01.") but doesn't reveal the method (annealing/linear/etc.) and how much. |
#* The paper implies that learning rate was varied during training ("initial learning rate of Adagrad α = 0.01.") but doesn't reveal the method (annealing/linear/etc.) and how much. |
||
#* The paper says "A slight variation is that we compute the softmax probabilities only among the feasible transitions in practice." but the implementation [https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/parser/nndep/Classifier.java#L247 actually compute all probabilities]. |
#* The paper says "A slight variation is that we compute the softmax probabilities only among the feasible transitions in practice." but the implementation [https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/parser/nndep/Classifier.java#L247 actually compute all probabilities]. |
||
− | #* |
+ | #* Note from the source code: [https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/parser/nndep/Classifier.java#L126 output layer doesn't have bias terms] which is consistence with the paper -- there is no bias in feature templates. |
== References == |
== References == |
||
<references/> |
<references/> |
Revision as of 12:51, 5 January 2016
This page documents necessary steps to reproduce results of Chen & Manning (2014)[1] for English (including re-implementation) and makes explicit decisions that aren't covered in the paper.
- Obtain data: WSJ part of PENN Treebank. Section 02-21 for training, 22 for development, 23 for testing.
I used this revised version: LDC2015T13LTH converter doesn't work with this version.- I used PENN Treebank 3.
- Assign POS tags using Stanford POS tagger with ten-way jackknifing of the training data
- Reported accuracy: ≈ 97.3%
- I used version 3.6.0 downloaded here and followed instructions in the JavaDoc.
- Reused
english-bidirectional-distsim.tagger.props
. Downloaded word clusters. Fixed a crash. (Which used bidirectional5words model.) - Instructions say: "The part-of-speech tags used as input for training and testing were generated by the Stanford POS Tagger (using the bidirectional5words model)."
- It's not clear how to divide the folds which can make a difference. I divide it by sentences, the accuracy is 97.18%. I also tried to divide by documents and it wasn't better.
- Constituent-to-dependency conversion:
- LTH Constituent-to-Dependency Conversion Tool
- Downloaded pennconverter
- The paper didn't specify command-line options or reference type of conversion
- Head-finding rules matter
- The default is CoNLL-2008 conventions and CoNLL-X file format
- I tried
-oldLTH
and-conll2007
but it doesn't split tokens with slases (different from footnote 6 page 745) - Tried
-rightBranching=false
and the performance of MaltParser was low: around 80% instead of 90%. - Command:
java -jar pennconverter.jar
- Error in one sentence, skipped. I submitted a question on Stackoverflow.
- Stanford Basic Dependencies
- Use Stanford parser v3.3.0 (page 745), downloaded here under the name
stanford-parser-full-2013-11-12
. - Convert PENN Treebank to Stanford Basic Dependency using:
java -cp stanford-parser-full-2014-10-31/stanford-parser.jar edu.stanford.nlp.trees.EnglishGrammaticalStructure -basic -conllx -originalDependencies -treeFile xxx
- Use Stanford parser v3.3.0 (page 745), downloaded here under the name
- LTH Constituent-to-Dependency Conversion Tool
- Measure statistics: sentences, words, POS's, labels, projective percentage (Table 3)
- TODO
- Evaluation tool:
- Downloaded MaltEval
- Should I use CoNLL-X eval script instead?? What is the difference between them?
- Stanford also provides evaluation tool: "The package includes a tool for scoring of generic dependency parses, in a class edu.stanford.nlp.trees.DependencyScoring. This tool measures scores for dependency trees, doing F1 and labeled attachment scoring. The included usage message gives a detailed description of how to use the tool."
- Run Stanford neural parser on the data and measure results.
- Download
stanford-parser-full-2014-10-31.zip
as instructed here
- Download
- Run off-the-shelf MaltParser and MSTParser on dev and test sets.
- Implement oracle
- Implement neural net
- Dropout:
it isn't clear where did they apply dropout: to the output of embedding layer or hidden layer?applied to hidden layer units. - The paper implies that learning rate was varied during training ("initial learning rate of Adagrad α = 0.01.") but doesn't reveal the method (annealing/linear/etc.) and how much.
- The paper says "A slight variation is that we compute the softmax probabilities only among the feasible transitions in practice." but the implementation actually compute all probabilities.
- Note from the source code: output layer doesn't have bias terms which is consistence with the paper -- there is no bias in feature templates.
- Dropout:
References
- ↑ Chen, D., & Manning, C. (2014). A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 740–750). Doha, Qatar: Association for Computational Linguistics.