Natural Language Understanding Wiki
Advertisement


Sources of information[]

  • Lexical resources
    • WordNet
    • Ontologies
  • Corpora

Theoretical considerations[]

  • Identity: maximal score for identical concepts

Triangle inequality[]

Triangle inequality: if A is close to B, B is close to C then A and C cannot be too far apart.[1][2] Triangle inequality is one of metric axioms. If it doesn't hold then a measure of distance is not a proper metric.

Tversky argued that triangle inequality is not valid.[1] but Rada et al. (1989)[3] showed that his examples were inconsistent.

Lin (1998)[4] also argued that triangle inequality was undesirable but he used an artificial and limited example.

Similarity measures[]

Purely WordNet[]

Purely corpus-based[]

Hybrid[]

Applications[]

  • Semantic Role Labeling: Fuerstenau and Lapata (2012)[5]
  • Textual Entailment: Berant et al. (2012)[6]
  • Question Answering: Surdeanu et al. (2011)[7]

Evaluation[]

TODO: better than Spearman's rho? MaxDiff (Louviere 1991; Orme 2009) --> avoid “scale bias”?

References[]

  1. 1.0 1.1 Tversky, Amos (1977). "Features of Similarity" (PDF). Psychological Reviews 84 (4): 327–352.
  2. There is also "reverse triangle inequality" for similarity: the similarity of A to C is greater than the sum of the similarity of A to B and the similarity of B to C. But it is shown to not hold (Rada et al., 1989).
  3. Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. Systems, Man and Cybernetics, IEEE Transactions on, 19(1), 17-30.
  4. Lin, Dekang. 1998. An information-theoretic definition of similarity (PDF). In Proceedings of the 15th International Conference on Machine Learning, pages 296–304, July
  5. Hagen Fuerstenau and Mirella Lapata. Semisupervised semantic role labeling via structural alignment. Computational Linguistics, 38(1): 135–171, 2012.
  6. Jonathan Berant, Ido Dagan, and Jacob Goldberger. Learning entailment relations by global graph structure optimization. Computational Linguis- tics, 38(1):73–111, 2012.
  7. Mihai Surdeanu, Massimiliano Ciaramita, and Hugo Zaragoza. Learning to rank answers to non-factoid questions from web collections. Computational Linguistics, 37(2):351–383, 2011.

External links[]