Changes: Significance test

Revision as of 23:54, 18 September 2017

Lee et al. (2015)^[1] use "two-sided bootstrap resampling statistical significance tests (Graham et al., 2014)"

Bugert et al. 2017^[2]; Zhou et al. 2015^[3], Nirve et al. (2009)^[4] use McNemar's test

I saw some paper(s) use Koehn's subsampling procedure (Koehn 2004)^[5]

Zapirain et al. (2013)^[6]: "we checked for statistical significance using bootstrap resampling (100 samples) coupled with one-tailed paired t-test (Noreen 1989)."

Bengtson & Roth (2008)^[7]: "paired non-parametric bootstrapping percentile test".

Batchkarov et al. (2016)^[8] use "bootstrapping" to estimate variance and later on hint on statistical significance.

References

↑ Lee, K., Artzi, Y., Choi, Y., & Zettlemoyer, L. (2015). Event Detection and Factuality Assessment with Non-Expert Supervision. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1643–1648.
↑ Bugert, M., Puzikov, Y., Andreas, R., Eckle-kohler, J., Martin, T., & Mart, E. (2017). LSDSem 2017 : Exploring Data Generation Methods for the Story Cloze Test. The 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-Level Semantics (LSDSEM 2017), (2016), 56–61.
↑ Zhou, Mengfei, Anette Frank, Annemarie Friedrich, and Alexis Palmer. “Semantically Enriched Models for Modal Sense Classification.” In Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem), p. 44. 2015.
↑ Nivre, J., Kuhlmann, M., & Hall, J. (2009). An Improved Oracle for Dependency Parsing with Online Reordering. In Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09) (pp. 73–76). Paris, France: Association for Computational Linguistics.
↑ Koehn, P. (2004). Statistical significance tests for machine translation evaluation. Proceedings of the Conference on Empirical Methods in Natural Language Processing, 4, 388–395. http://doi.org/10.1145/2063576.2063688
↑ Zapirain, B., Agirre, E., Màrquez, L., & Surdeanu, M. (2013). Selectional Preferences for Semantic Role Classification. Computational Linguistics, 39(3).
↑ Bengtson, E., & Roth, D. (2008). Understanding the value of features for coreference resolution. Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP ’08, 51(October), 294. http://doi.org/10.3115/1613715.1613756
↑ ACL 2016. http://sro.sussex.ac.uk/62044/1/acl2016.pdf

[1] Lee, K., Artzi, Y., Choi, Y., & Zettlemoyer, L. (2015). Event Detection and Factuality Assessment with Non-Expert Supervision. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1643–1648.

[2] Bugert, M., Puzikov, Y., Andreas, R., Eckle-kohler, J., Martin, T., & Mart, E. (2017). LSDSem 2017 : Exploring Data Generation Methods for the Story Cloze Test. The 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-Level Semantics (LSDSEM 2017), (2016), 56–61.

[3] Zhou, Mengfei, Anette Frank, Annemarie Friedrich, and Alexis Palmer. “Semantically Enriched Models for Modal Sense Classification.” In Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem), p. 44. 2015.

[4] Nivre, J., Kuhlmann, M., & Hall, J. (2009). An Improved Oracle for Dependency Parsing with Online Reordering. In Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09) (pp. 73–76). Paris, France: Association for Computational Linguistics.

[5] Koehn, P. (2004). Statistical significance tests for machine translation evaluation. Proceedings of the Conference on Empirical Methods in Natural Language Processing, 4, 388–395. http://doi.org/10.1145/2063576.2063688

[6] Zapirain, B., Agirre, E., Màrquez, L., & Surdeanu, M. (2013). Selectional Preferences for Semantic Role Classification. Computational Linguistics, 39(3).

[7] Bengtson, E., & Roth, D. (2008). Understanding the value of features for coreference resolution. Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP ’08, 51(October), 294. http://doi.org/10.3115/1613715.1613756

[8] ACL 2016. http://sro.sussex.ac.uk/62044/1/acl2016.pdf

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

@@ Line 2: / Line 2: @@
 significance tests (Graham et al., 2014)"
-Bugert et al. 2017<ref>Bugert, M., Puzikov, Y., Andreas, R., Eckle-kohler, J., Martin, T., & Mart, E. (2017). LSDSem 2017 : Exploring Data Generation Methods for the Story Cloze Test. The 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-Level Semantics (LSDSEM 2017), (2016), 56–61.</ref>; Zhou et al. 2015<ref>Zhou, Mengfei, Anette Frank, Annemarie Friedrich, and Alexis Palmer. “Semantically Enriched Models for Modal Sense Classification.” In ''Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem)'', p. 44. 2015.</ref> use McNemar's test
+Bugert et al. 2017<ref>Bugert, M., Puzikov, Y., Andreas, R., Eckle-kohler, J., Martin, T., & Mart, E. (2017). LSDSem 2017 : Exploring Data Generation Methods for the Story Cloze Test. The 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-Level Semantics (LSDSEM 2017), (2016), 56–61.</ref>; Zhou et al. 2015<ref>Zhou, Mengfei, Anette Frank, Annemarie Friedrich, and Alexis Palmer. “Semantically Enriched Models for Modal Sense Classification.” In ''Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem)'', p. 44. 2015.</ref>, Nirve et al. (2009)<ref>Nivre, J., Kuhlmann, M., & Hall, J. (2009). An Improved Oracle for Dependency Parsing with Online Reordering. In ''Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)'' (pp. 73–76). Paris, France: Association for Computational Linguistics.</ref> use McNemar's test
 I saw some paper(s) use Koehn's subsampling procedure (Koehn 2004)<ref>Koehn, P. (2004). Statistical significance tests for machine translation evaluation. ''Proceedings of the Conference on Empirical Methods in Natural Language Processing'', ''4'', 388–395. http://doi.org/10.1145/2063576.2063688</ref>