Some tasks inherently deal with multiple sentences such as coreference resolution, temporal relation classification, implicit semantic role labeling. Early work often uses a local/pair-wise treatment:

  • Database record-linkage and citation reference matching: Monge & Elkan, 1997; McCallum et al., 2000; Bilenko & Mooney, 2002; Cohen & Richman, 2002
  • Coreference resolution: McCarthy & Lehnert, 1995; Ge et al., 1998; Soon et al., 2001; Ng & Cardie, 2002;
  • Implicit semantic role labeling: Schenk and Chiarcos, 2016[1];
  • Temporal Relation Classification: Bethard & Martin (2007)[2]

However, people quickly come up with various way to encode and exploit cross-sentence dependency.

TODO: Luo et al. (2004) who used a Bell tree whose leaves represent possible partitionings of the mentions into entities and then trained a model for searching the tree; Ng (2005) who took a reranking approach;.

Which tasks require cross-sentence dependency? Edit

There are evidences in psycholinguistics that syntactic parsing and lower-level tasks doesn't require global information (Hernandez-Gutierreza et al. 2016[3]). Semantic role labeling seems to lay on the border -- empirically, it can be solved quite well within sentence boundaries but intuitively, some global information might bring the performance higher. Semantic role labeling, especially in PropBank formalism, seems also local. Systems that work within sentence boundaries achieve quite high results such as ~80%, given that the inter-annotator agreement might not be much higher than 90% (TODO).

Types of dependency Edit

Trivial hard constraints Edit

Some cross-sentence constraints are obvious and can be captured by simple rules enforced by Integer Linear Programming (ILP).

Transitivity: From Finkel and Manning (2008)[4]: "If John Smith was labeled coreferent with Smith, and Smith with Jane Smith, then John Smith and Jane Smith were also corefer- ent regardless of the classifier’s evaluation of that pair." They use ILP to perform inference.

Denis and Baldridge (2009)[5] solve coreference resolution, anaphora resolution and named-entity classification in a joint global model. Their constraints "ensure: (i) coherence between the final decisions of the three local models, and (ii) transitivity of multiple coreference decisions." They also use ILP to perform inference.

Uryupina & Moschitti (2015)[6] propose a simple decoding algorithm for mention-pair coreference resolution that, when combined with pre-filtering and feature combination, achieves 61.82% on CoNLL. They link or "unlink" the pairs that are classified with the highest confidence first and then proceed to the last, maintaining "unlinking" constraints (i.e. a pair that was "unlinked" never ends up in the same cluster). This can be considered as combining transitivity with easy-first decoding.

Do et al. (2012)[7] use ILP to enforce constraints in timeline construction (e.g. each event is associated with only one time interval, reflexivity and transitivity constraints on the relations among event mentions).

Flat features Edit

Flat features are features represented as a list and can be used for standard classifiers such as SVM, maximum entropy.

Ng (2005)[8] use SVM to rerank coreferent partitions (generated by some local classifiers). The encode each partition by features similar to that of local classifiers but differentiating between coreferent and non-coreferent pairs. This way, they try to capture, e.g., "the probability that two NPs residing in the same cluster have incompatible gender values."

Graph-based heuristics Edit

TODO: BabelFy (Moro et al, 2014)[9], many more

Logic Edit

TODO: Berant, J., Srikumar, V., Chen, P.-C., Linden, A. Vander, Harding, B., Huang, B., … Manning, C. D. (2014). Modeling Biological Processes for Reading Comprehension. In Empirical Methods in Natural Language Processing (EMNLP).

There are many researches on using Markov logic in NLP -- coreference resolution (Culotta et al. 2007[10]; Poon and Domingos, 2008[11]), TODO: entity resolution

(Weighted) abduction: in coreference resolution: Inoue et al. (2012)[12], more: "For examples of the application of weighted abduction to discourse processing see (Charniak and Goldman, 1991; Inoue and Inui, 2011; Ovchinnikova et al., 2011; Ovchinnikova, 2012)." (Inoue et al., 2012)

(Probabilistic) scripts Edit

People start using scripts to solve hard cases in coreference resolution.

Sequence models Edit

TODO: conditional random field McCallum and Wellner (2004)[13]

References Edit

  1. Schenk, N., & Chiarcos, C. (2016). Unsupervised Learning of Prototypical Fillers for Implicit Semantic Role Labeling. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1473–1479). San Diego, California: Association for Computational Linguistics.
  2. Bethard, S., & Martin, J. H. (2007). CU-TMP: Temporal Relation Classification Using Syntactic and Semantic Features. Proceedings of the 4th International Workshop on Semantic Evaluations, (June), 129–132. Excerpt: "We approach these tasks as pair-wise classification problems, where each event/time pair is assigned one of the TempEval relation classes (BEFORE, AFTER, etc.)."
  3. Hernández-Gutiérrez, David, et al. "Do discourse global coherence and cumulated information impact on sentence syntactic processing? An event-related brain potentials study." Brain research 1630 (2016): 109-119.
  4. Finkel, R. J., & Manning, D. C. (2008). Enforcing Transitivity in Coreference Resolution. In Proceedings of ACL-08: HLT, Short Papers (pp. 45–48). Association for Computational Linguistics.
  5. Denis, Pascal, and Jason Baldridge. "Global joint models for coreference resolution and named entity classification." Procesamiento del Lenguaje Natural 42, no. 1 (2009): 87-96.
  6. Uryupina, O., & Moschitti, A. (2015). A State-of-the-Art Mention-Pair Model for Coreference Resolution. SemEval2015, 289–298.
  7. Do, Q. X., Lu, W., & Roth, D. (2012). Joint Inference for Event Timeline Construction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 677–687). Stroudsburg, PA, USA: Association for Computational Linguistics.
  8. Ng, V. (2005). Machine Learning for Coreference Resolution: From Local Classification to Global Ranking. Proceedings of the 43rd Annual Meeting of the Asssociation for Computational Linguistics (ACL05), (June), 157–164. doi:10.3115/1219840.1219860
  9. Moro, A., Raganato, A., & Navigli, R. (2014). Entity Linking meets Word Sense Disambiguation: A Unified Approach. Transactions of the Association for Computational Linguistics, 2, 231–244.
  10. Culotta, A., Wick, M., & McCallum, A. (2007). First-Order Probabilistic Models for Coreference Resolution. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference (pp. 81–88). Rochester, New York: Association for Computational Linguistics.
  11. Poon, H., & Domingos, P. (2008). Joint Unsupervised Coreference Resolution with Markov Logic. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 650–659). Stroudsburg, PA, USA: Association for Computational Linguistics. Retrieved from
  12. Inoue, N., Ovchinnikova, E., Inui, K., & Hobbs, J. (2012). Coreference Resolution with ILP-based Weighted Abduction. In Proceedings of COLING 2012 (pp. 1291–1308). Mumbai, India: The COLING 2012 Organizing Committee. Retrieved from
  13. Mccallum, A., & Wellner, B. (2004). Conditional Models of Identity Uncertainty with Application to Noun Coreference. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in Neural Information Processing Systems 17 (pp. 905–912). Cambridge, MA: MIT Press.
Community content is available under CC-BY-SA unless otherwise noted.