(batchrakov) Tags: Visual edit apiedit |
(Nirve et al. (2009)) Tags: Visual edit apiedit |
||
Line 2: | Line 2: | ||
significance tests (Graham et al., 2014)" |
significance tests (Graham et al., 2014)" |
||
− | Bugert et al. 2017<ref>Bugert, M., Puzikov, Y., Andreas, R., Eckle-kohler, J., Martin, T., & Mart, E. (2017). LSDSem 2017 : Exploring Data Generation Methods for the Story Cloze Test. The 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-Level Semantics (LSDSEM 2017), (2016), 56–61.</ref>; Zhou et al. 2015<ref>Zhou, Mengfei, Anette Frank, Annemarie Friedrich, and Alexis Palmer. “Semantically Enriched Models for Modal Sense Classification.” In ''Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem)'', p. 44. 2015.</ref> use McNemar's test |
+ | Bugert et al. 2017<ref>Bugert, M., Puzikov, Y., Andreas, R., Eckle-kohler, J., Martin, T., & Mart, E. (2017). LSDSem 2017 : Exploring Data Generation Methods for the Story Cloze Test. The 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-Level Semantics (LSDSEM 2017), (2016), 56–61.</ref>; Zhou et al. 2015<ref>Zhou, Mengfei, Anette Frank, Annemarie Friedrich, and Alexis Palmer. “Semantically Enriched Models for Modal Sense Classification.” In ''Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem)'', p. 44. 2015.</ref>, Nirve et al. (2009)<ref>Nivre, J., Kuhlmann, M., & Hall, J. (2009). An Improved Oracle for Dependency Parsing with Online Reordering. In ''Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)'' (pp. 73–76). Paris, France: Association for Computational Linguistics.</ref> use McNemar's test |
I saw some paper(s) use Koehn's subsampling procedure (Koehn 2004)<ref>Koehn, P. (2004). Statistical significance tests for machine translation evaluation. ''Proceedings of the Conference on Empirical Methods in Natural Language Processing'', ''4'', 388–395. http://doi.org/10.1145/2063576.2063688</ref> |
I saw some paper(s) use Koehn's subsampling procedure (Koehn 2004)<ref>Koehn, P. (2004). Statistical significance tests for machine translation evaluation. ''Proceedings of the Conference on Empirical Methods in Natural Language Processing'', ''4'', 388–395. http://doi.org/10.1145/2063576.2063688</ref> |
Revision as of 23:54, 18 September 2017
Lee et al. (2015)[1] use "two-sided bootstrap resampling statistical significance tests (Graham et al., 2014)"
Bugert et al. 2017[2]; Zhou et al. 2015[3], Nirve et al. (2009)[4] use McNemar's test
I saw some paper(s) use Koehn's subsampling procedure (Koehn 2004)[5]
Zapirain et al. (2013)[6]: "we checked for statistical significance using bootstrap resampling (100 samples) coupled with one-tailed paired t-test (Noreen 1989)."
Bengtson & Roth (2008)[7]: "paired non-parametric bootstrapping percentile test".
Batchkarov et al. (2016)[8] use "bootstrapping" to estimate variance and later on hint on statistical significance.
References
- ↑ Lee, K., Artzi, Y., Choi, Y., & Zettlemoyer, L. (2015). Event Detection and Factuality Assessment with Non-Expert Supervision. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1643–1648.
- ↑ Bugert, M., Puzikov, Y., Andreas, R., Eckle-kohler, J., Martin, T., & Mart, E. (2017). LSDSem 2017 : Exploring Data Generation Methods for the Story Cloze Test. The 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-Level Semantics (LSDSEM 2017), (2016), 56–61.
- ↑ Zhou, Mengfei, Anette Frank, Annemarie Friedrich, and Alexis Palmer. “Semantically Enriched Models for Modal Sense Classification.” In Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem), p. 44. 2015.
- ↑ Nivre, J., Kuhlmann, M., & Hall, J. (2009). An Improved Oracle for Dependency Parsing with Online Reordering. In Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09) (pp. 73–76). Paris, France: Association for Computational Linguistics.
- ↑ Koehn, P. (2004). Statistical significance tests for machine translation evaluation. Proceedings of the Conference on Empirical Methods in Natural Language Processing, 4, 388–395. http://doi.org/10.1145/2063576.2063688
- ↑ Zapirain, B., Agirre, E., Màrquez, L., & Surdeanu, M. (2013). Selectional Preferences for Semantic Role Classification. Computational Linguistics, 39(3).
- ↑ Bengtson, E., & Roth, D. (2008). Understanding the value of features for coreference resolution. Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP ’08, 51(October), 294. http://doi.org/10.3115/1613715.1613756
- ↑ ACL 2016. http://sro.sussex.ac.uk/62044/1/acl2016.pdf