FANDOM


ACE 2004 (gold mentions) Edit

Model MUC B3 CEAFφ4 Avg. F1 Notes Ref
Prec. Rec. F1 Prec. Rec. F1 Prec. Rec. F1
First-Order MIRA 86.7 73.2 79.3 Culotta et al. (2007)[1]
Bengtson and Roth (2008) 88.3 74.5 80.8 Bengtson and Roth (2008)[2]
Chang et al. (2013) (Illinois)[3] 79.42 Peng et al. (2015)[4]
Lee et al. (2011) (Stanford Sieve)[5] 81.05 Peng et al. (2015)[4]

ACE 2004 (predicted mentions) Edit

Model MUC B3 CEAFφ4 Avg. F1 Notes Ref
Prec. Rec. F1 Prec. Rec. F1 Prec. Rec. F1
Lee et al. (2011) (Stanford Sieve)[5] 63.89 70.33 70.21 68.14 Peng et al. (2015)[4]
Peng et al. (2015) (Illinois) 67.28 73.06 73.25 71.20 Table 2, evaluated using complete mentions instead of mention heads Peng et al. (2015)[4]
Discourse-driven LM 71.79 Peng and Roth (2016)[6]

ACE 2005 (Stoyanov et al.'s split) Edit

Notice that:

  • This set contains only the newswire portion of ACE-2005
  • Stoyanov et al. (2009)[7] didn't specify document names so the numbers here might not be strictly comparable. (They state: "When available, we use the standard test/train split. Otherwise, we randomly split the data into a training and test set following a 70/30 ratio.")
  • Train: 57, test: 24 documents
Model MUC B3 All B3 None Pairwise Performance Ref.
P R F1 P R F1 P R F1 P R F1
Haghighi and Klein (2010)[8] 74.6 62.7 68.1 83.2 68.4 75.1 82.7 66.3 73.6 64.3 41.4 50.4 Haghighi and Klein (2010)[8]
Haghighi and Klein (2009)[9] 73.1 58.8 65.2 82.1 63.9 71.8 81.2 61.6 70.1 66.1 37.9 48.1
Stoyanov et al. (2009)[10] - - 67.4 - - 73.7 - - 72.5 - - -

ACE-2005 (Rahman and Ng's split) Edit

Notice:

  • Created from originally-for-training documents (only participants of ACE 2005 have access to the official test set)
  • Balanced between six genres
  • The authors didn't announce document names
  • Train: 482, test: 117 documents (80/20)
Model MUC B3 All B3 None Pairwise Performance Ref.
P R F1 P R F1 P R F1 P R F1
Haghighi and Klein (2010)[8] 77.0 66.9 71.6 55.4 74.8 63.8 54.0 74.7 62.7 60.1 47.7 53.0 Haghighi and Klein (2010)[8]
Haghighi and Klein (2009)[9] 72.9 60.2 67.0 53.2 73.1 61.6 52.0 72.6 60.6 57.0 44.6 50.0
Rahman and Ng (2009)[11] 75.4 64.1 69.3 - - - 54.4 70.5 61.4 - - -

CoNLL 2012 English Edit

Model MUC B3 CEAFφ4 Avg. F1 Notes Ref
Prec. Rec. F1 Prec. Rec. F1 Prec. Rec. F1
Lee et al. (2013)[12] 65.08 62.41 63.72 50.23 54.08 52.08 Lee et al. (2017)[13]
Fernandes et al. (2012)[14] 65.83 75.91 70.51 51.55 65.19 57.58 Lee et al. (2017)[13]
Durrett and Klein (2013)[15] 66.58 74.94 70.51 53.2 64.56 58.33 Lee et al. (2017)[13]
Björkelund and Kuhn (2014)[16] 67.46 74.3 70.72 54.96 62.71 58.58 Lee et al. (2017)[13]
Durrett and Klein (2014)[17] 69.91 72.61 71.24 56.43 61.18 58.71 Lee et al. (2017)[13]
Discourse-driven LM 63.46 Table 5, Base+EC-LB (pc +em) Peng and Roth (2016)[6]
Wiseman et al. (2016)[18] 77.49 69.75 73.42 66.83 56.95 61.50 62.14 53.85 57.70 64.21 table 1 Clark & Manning (2016b)[19]
Clark & Manning (2016a)[20] 79.91 69.30 74.23 71.01 56.53 62.95 63.84 54.33 58.70 65.29 table 1 Clark & Manning (2016b)[19]
Clark & Manning (2016b)[19] 79.19 70.44 74.56 69.93 57.99 63.40 63.46 55.52 59.23 65.73 Reward Rescaling, table 1 Clark & Manning (2016b)[19]

CoNLL 2012 Chinese Edit

Model MUC B3 CEAFφ4 Avg. F1 Notes Ref
Prec. Rec. F1 Prec. Rec. F1 Prec. Rec. F1
Björkelund & Kuhn (2014)[16] 69.39 62.57 65.80 61.64 53.87 57.49 59.33 54.65 56.89 60.06 Clark & Manning (2016b)[19], table 1
Clark & Manning (2016a)[20] 73.85 65.42 69.38 67.53 56.41 61.47 62.84 57.62 60.12 63.66 Clark & Manning (2016b)[19], table 1
Clark & Manning (2016b)[19] 73.64 65.62 69.40 67.48 56.94 61.76 62.46 58.60 60.47 63.88 Reward Rescaling Clark & Manning (2016b)[19], table 1

See also Edit

References Edit

  1. A. Culotta, M.Wick, R. Hall, and A. McCallum. 2007. First-order probabilistic models for coreference res- olution. In NAACL.
  2. E. Bengtson and D. Roth. 2008. Understanding the value of features for coreference resolution. In EMNLP.
  3. K.-W. Chang, R. Samdani, and D. Roth. 2013. A con- strained latent variable model for coreference resolu- tion. In EMNLP.
  4. 4.0 4.1 4.2 4.3 Peng, H., Chang, K.-W., & Roth, D. (2015). A Joint Framework for Coreference Resolution and Mention Head Detection. CoNLL, 12–21.
  5. 5.0 5.1 H. Lee, Y. Peirsman, A. Chang, N. Chambers, M. Sur- deanu, and D. Jurafsky. 2011. Stanford’s multi- pass sieve coreference resolution system at the conll- 2011 shared task. In Proceedings of the CoNLL-2011 Shared Task.
  6. 6.0 6.1 Peng, H., & Roth, D. (2016). Two Discourse Driven Language Models for Semantics. ACL 2016, 290–300.
  7. Stoyanov, V., Gilbert, N., Cardie, C., & Riloff, E. (2009). Conundrums in Noun Phrase Coreference Resolution : Making Sense of the State-of-the-Art. ACL 2009, (August), 656–664. http://doi.org/10.3115/1690219.1690238
  8. 8.0 8.1 8.2 8.3 Haghighi, A., & Klein, D. (2010). Coreference resolution in a modular, entity-centered model. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, (June), 385–393. http://doi.org/10.3115/1608810.1608821
  9. 9.0 9.1 Aria Haghighi and Dan Klein. 2009. Simple Coreference Resolution with Rich Syntactic and Semantic Features. In Proceedings of the 2009 Conference on Empirical Conference in Natural Language Processing.
  10. V Stoyanov, N Gilbert, C Cardie, and E Riloff. 2009. Conundrums in Noun Phrase Coreference Resolution: Making Sense of the State-of-the-art. In Associate of Computational Linguistics (ACL).
  11. A Rahman and V Ng. 2009. Supervised models for coreference resolution. In Proceedings of the 2009 Conference on Empirical Conference in Natural Lan- guage Processing.
  12. Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., and Jurafsky, D. 2013. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics 39(4): 885–916.
  13. 13.0 13.1 13.2 13.3 13.4 LEE, HEEYOUNG, MIHAI SURDEANU, and DAN JURAFSKY. "A scaffolding approach to coreference resolution integrating statistical and rule-based models." Natural Language Engineering (2017): 1-30.
  14. Fernandes, E. R., dos Santos, C. N., and Milidiu, R. L. 2012. Latent structure perceptron with feature induction for unrestricted coreference resolution. In EMNLP-CoNLL, Jeju, Republic of Korea, pp. 41–8.
  15. Durrett, G., and Klein, D. 2013. Easy victories and uphill battles in coreference resolution. In Proceedings of EMNLP-2013, Seattle, Washington.
  16. 16.0 16.1 Anders Björkelund and Jonas Kuhn. 2014. Learning structured perceptrons for coreference resolution with latent antecedents and non-local features. In Associa- tion of Computational Linguistics (ACL).
  17. Durrett, G., and Klein, D. 2014. A joint model for entity analysis: coreference, typing, and linking. TACL 2: 477–90.
  18. Sam Wiseman, Alexander M. Rush, and Stuart M. Shieber. 2016. Learning global features for corefer- ence resolution. In Human Language Technology and North American Association for Computational Lin- guistics (HLT-NAACL).
  19. 19.0 19.1 19.2 19.3 19.4 19.5 19.6 19.7 Clark, K., & Manning, C. D. (2016b). Deep Reinforcement Learning for Mention-Ranking Coreference Models. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), 2256–2262. Retrieved from http://arxiv.org/abs/1609.08667
  20. 20.0 20.1 Kevin Clark and Christopher D. Manning. 2016a. Improving coreference resolution by learning entity-level distributed representations. In Association for Compu- tational Linguistics (ACL).