Natural Language Understanding Wiki


Lack of labeled data: According to Resnik (1997)[1]:

"... not only the limited availability of such text at present, but skepticism that the situation will change any time soon. In marked contrast to annotated training material for part-of-speech tagging, (a) there is no coarse-level set of sense distinctions widely agreed upon (whereas part-of-speech tag sets tend to differ in the details); (b) sense annotation has a comparatively high error rate (Miller, personal communication, reports an upper bound for human annotators of around 90% for ambiguous cases, using a non-blind evaluation method that may make even this estimate overly optimistic); and (c) no fully automatic method provides high enough quality output to support the "annotate automatically, correct manually" methodology used to provide high volume annotation by data providers like the Penn Treebank project (Marcus et al., 1993)"


  • local features which represent the local context of a word usage, that is, features of a small number of words surrounding the target word, including part-of-speech tags, word forms, positions with respect to the target word, etc.;
  • topical features which—in contrast to local features—define the general topic of a text or discourse, thus representing more general contexts (e.g., a window of words, a sentence, a phrase, a paragraph, etc.), usually as bags of words;
  • syntactic features, representing syntactic cues and argument-head relations between the target word and other words within the same sentence (note that these words might be outside the local context);
  • semantic features, representing semantic information, such as previously established senses of words in context, domain indicators, etc.


A space of approaches to WSD according to the amount of supervision and knowledge used. (a) fully unsupervised methods, which do not use any amount of knowledge (not even sense inventories); (b) and (c) minimally supervised and semi-supervised approaches, requiring a minimal or partial amount of supervision, respectively; (d) supervised approaches (machine-learning classifiers); most knowledge-based approaches relying on structural properties (g), such as the graph structure of semantic networks, usually use more supervision and knowledge than those based on gloss overlap (e) or methods for determining word sense dominance (f). Finally, domain-driven approaches, which often exploit hand-coded domain labels, can be placed around point (h) if they include supervised components for estimating sense probabilities, or around point (i) otherwise. Taken from Navigli (2009).




  • Lesk[2]: performs WSD based on the overlap between the context surrounding the target word to be disambiguated and the definitions of its candidate senses (Kilgarriff and Rosenzweig, 2000). Given a target word w, this method assigns to w the sense whose gloss has the highest overlap (i.e. most words in common) with the context of w, namely the set of content words co-occurring with it in a pre-defined window.
  • Simplified Lesk: the correct meaning of each word in a given context is determined individually by locating the sense that overlaps the most between its dictionary definition and the given context. Rather than simultaneously determining the meanings of all words in a given context, this approach tackles each word individually, independent of the meaning of the other words occurring in the same context.
  • Extended Lesk: Due to the limited context provided by the WordNet glosses, Banerjee and Pedersen (2003) expand the gloss of each sense s to include words from the glosses of those synsets in a semantic relation with s.
  • Degree Centrality: Starting from each sense s of the target word, it performs a depth-first search (DFS) of the WordNet graph and collects all the paths connecting s to senses of other words in context. As a result, a sentence graph is produced. A maximum search depth is established to limit the size of this graph. The sense of the target word with the highest vertex degree is selected.


Bootstrapping: From Niu et al. (2005)[3]: "bootstrapping algorithm works by iteratively classifying unlabeled examples and adding confidently classified examples into labeled dataset using a model learned from augmented labeled dataset in previous iteration.

Label propagation: Main ref: Niu et al. (2005)[3]. It's not an algorithm to increase training data but to perform classification (i.e. it runs at test time only). It has been found to be more effective than bootstrapping and SVM (Niu et al. 2005).

Enriching WordNet[]

  • WordNet++: Ponzetto & Navigli (2010)[4]
  • Using random walk: Agirre et al. (2014)[5]


  • The SemEval-2013 task 12 dataset for multilingual WSD (Navigli et al., 2013), which consists of 13 documents in different domains, available in 5 languages. For each language, all noun occurrences were annotated using BabelNet, thereby providing Wikipedia and WordNet annotations wherever applicable. The number of mentions to be disambiguated roughly ranges from 1K to 2K per language in the different setups.
  • The SemEval-2007 task 7 dataset for coarse-grained English all-words WSD (Navigli et al., 2007). We take into account only nominal mentions obtaining a dataset containing 1107 nouns to be disambiguated using WordNet.
  • The SemEval-2007 task 17 dataset for fine-grained English all-words WSD (Pradhan et al., 2007). We considered only nominal mentions resulting in 158 nouns annotated with WordNet synsets.
  • The Senseval-3 dataset for English all-words WSD (Snyder and Palmer, 2004), which contains 899 nouns to be disambiguated using WordNet.
  • Babelfied English Wikipedia: from Flekova and Gurevych (2016)[6]: "Babelfied English Wikipedia (Scozzafava et al., 2015). To our knowledge, this is one of the largest published and evaluated sense-annotated corpora, containing over 500 million words, of which over 100 million are annotated with Babel synsets, with an estimated synset annotation accuracy of 77.8%."

See also[]

External links[]

Wikipedia has an article on: Word sense disambiguation


  1. Resnik, P. (1997, April). Selectional preference and sense disambiguation. In Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How (pp. 52-57).
  2. Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In SIGDOC '86: Proceedings of the 5th annual international conference on Systems documentation, pages 24-26, New York, NY, USA. ACM.
  3. 3.0 3.1 Zheng-Yu Niu, Dong-Hong Ji, and Chew Lim Tan. 2005. Word sense disambiguation using label propagation based semi-supervised learning. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 395–402. Association for Computational Linguistics.
  4. Ponzetto, S. P., & Navigli, R. (2010, July). Knowledge-rich word sense disambiguation rivaling supervised systems. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 1522-1531). Association for Computational Linguistics.
  5. Agirre, E., de Lacalle, O. L., & Soroa, A. (2014). Random walks for knowledge-based word sense disambiguation. Computational Linguistics, 40(1), 57-84.
  6. Flekova, Lucie, and Iryna Gurevych. "Supersense Embeddings: A Unified Model for Supersense Interpretation, Prediction, and Utilization." ACL 2016