Natural Language Understanding Wiki

An extended version of ECB, adding more events of the same topic to increase ambiguity.

Annotation quality: Upadhyay et al. (2016)[1] found over 300 errors. They can be found at:

"We used the approach of Goldberg and Elhadad (2007) to semi-automatically detect annotation errors, by training an anchored SVM. First, for each pair of mention (mi,mj) in the training data, we added a unique anchor feature aij, thus making the data linearly separable. Next, we trained a SVM classifier on all of the data with a high penalty parameter C. The classifier uses the anchor features to memorize the hard to classify examples, which are either genuine hard coreference pairs, or incorrect annotations. By thresholding the features weights for the anchor features |aij| > δ (we use δ = 0.95), we generated a short-list of these hard cases, which we then examined by a annotator for mistakes."


  1. Upadhyay, S., Gupta, N., Christodoulopoulos, C., & Roth, D. (2016). Revisiting the Evaluation for Cross Document Event Coreference. In COLING.