Natural Language Understanding Wiki
(batchrakov)
Tags: Visual edit apiedit
(Nirve et al. (2009))
Tags: Visual edit apiedit
Line 2: Line 2:
 
significance tests (Graham et al., 2014)"
 
significance tests (Graham et al., 2014)"
   
Bugert et al. 2017<ref>Bugert, M., Puzikov, Y., Andreas, R., Eckle-kohler, J., Martin, T., & Mart, E. (2017). LSDSem 2017 : Exploring Data Generation Methods for the Story Cloze Test. The 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-Level Semantics (LSDSEM 2017), (2016), 56–61.</ref>; Zhou et al. 2015<ref>Zhou, Mengfei, Anette Frank, Annemarie Friedrich, and Alexis Palmer. “Semantically Enriched Models for Modal Sense Classification.” In ''Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem)'', p. 44. 2015.</ref> use McNemar's test
+
Bugert et al. 2017<ref>Bugert, M., Puzikov, Y., Andreas, R., Eckle-kohler, J., Martin, T., & Mart, E. (2017). LSDSem 2017 : Exploring Data Generation Methods for the Story Cloze Test. The 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-Level Semantics (LSDSEM 2017), (2016), 56–61.</ref>; Zhou et al. 2015<ref>Zhou, Mengfei, Anette Frank, Annemarie Friedrich, and Alexis Palmer. “Semantically Enriched Models for Modal Sense Classification.” In ''Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem)'', p. 44. 2015.</ref>, Nirve et al. (2009)<ref>Nivre, J., Kuhlmann, M., & Hall, J. (2009). An Improved Oracle for Dependency Parsing with Online Reordering. In ''Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)'' (pp. 73–76). Paris, France: Association for Computational Linguistics.</ref> use McNemar's test
   
 
I saw some paper(s) use Koehn's subsampling procedure (Koehn 2004)<ref>Koehn, P. (2004). Statistical significance tests for machine translation evaluation. ''Proceedings of the Conference on Empirical Methods in Natural Language Processing'', ''4'', 388–395. http://doi.org/10.1145/2063576.2063688</ref>
 
I saw some paper(s) use Koehn's subsampling procedure (Koehn 2004)<ref>Koehn, P. (2004). Statistical significance tests for machine translation evaluation. ''Proceedings of the Conference on Empirical Methods in Natural Language Processing'', ''4'', 388–395. http://doi.org/10.1145/2063576.2063688</ref>

Revision as of 23:54, 18 September 2017

Lee et al. (2015)[1] use "two-sided bootstrap resampling statistical significance tests (Graham et al., 2014)"

Bugert et al. 2017[2]; Zhou et al. 2015[3], Nirve et al. (2009)[4] use McNemar's test

I saw some paper(s) use Koehn's subsampling procedure (Koehn 2004)[5]

Zapirain et al. (2013)[6]: "we checked for statistical significance using bootstrap resampling (100 samples) coupled with one-tailed paired t-test (Noreen 1989)."

Bengtson & Roth (2008)[7]: "paired non-parametric bootstrapping percentile test".

Batchkarov et al. (2016)[8] use "bootstrapping" to estimate variance and later on hint on statistical significance.

References

  1. Lee, K., Artzi, Y., Choi, Y., & Zettlemoyer, L. (2015). Event Detection and Factuality Assessment with Non-Expert Supervision. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1643–1648.
  2. Bugert, M., Puzikov, Y., Andreas, R., Eckle-kohler, J., Martin, T., & Mart, E. (2017). LSDSem 2017 : Exploring Data Generation Methods for the Story Cloze Test. The 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-Level Semantics (LSDSEM 2017), (2016), 56–61.
  3. Zhou, Mengfei, Anette Frank, Annemarie Friedrich, and Alexis Palmer. “Semantically Enriched Models for Modal Sense Classification.” In Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem), p. 44. 2015.
  4. Nivre, J., Kuhlmann, M., & Hall, J. (2009). An Improved Oracle for Dependency Parsing with Online Reordering. In Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09) (pp. 73–76). Paris, France: Association for Computational Linguistics.
  5. Koehn, P. (2004). Statistical significance tests for machine translation evaluation. Proceedings of the Conference on Empirical Methods in Natural Language Processing, 4, 388–395. http://doi.org/10.1145/2063576.2063688
  6. Zapirain, B., Agirre, E., Màrquez, L., & Surdeanu, M. (2013). Selectional Preferences for Semantic Role Classification. Computational Linguistics, 39(3).
  7. Bengtson, E., & Roth, D. (2008). Understanding the value of features for coreference resolution. Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP ’08, 51(October), 294. http://doi.org/10.3115/1613715.1613756
  8. ACL 2016. http://sro.sussex.ac.uk/62044/1/acl2016.pdf