Sources of information Edit
- Lexical resources
Theoretical considerations Edit
- Identity: maximal score for identical concepts
Triangle inequality Edit
Triangle inequality: if A is close to B, B is close to C then A and C cannot be too far apart. Triangle inequality is one of metric axioms. If it doesn't hold then a measure of distance is not a proper metric.
Lin (1998) also argued that triangle inequality was undesirable but he used an artificial and limited example.
Similarity measures Edit
Purely WordNet Edit
Purely corpus-based Edit
- Semantic Role Labeling: Fuerstenau and Lapata (2012)
- Textual Entailment: Berant et al. (2012)
- Question Answering: Surdeanu et al. (2011)
TODO: better than Spearman's rho? MaxDiff (Louviere 1991; Orme 2009) --> avoid “scale bias”?
- ↑ 1.0 1.1 Tversky, Amos (1977). "Features of Similarity" (PDF). Psychological Reviews 84 (4): 327–352.
- ↑ There is also "reverse triangle inequality" for similarity: the similarity of A to C is greater than the sum of the similarity of A to B and the similarity of B to C. But it is shown to not hold (Rada et al., 1989).
- ↑ Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. Systems, Man and Cybernetics, IEEE Transactions on, 19(1), 17-30.
- ↑ Lin, Dekang. 1998. An information-theoretic definition of similarity (PDF). In Proceedings of the 15th International Conference on Machine Learning, pages 296–304, July
- ↑ Hagen Fuerstenau and Mirella Lapata. Semisupervised semantic role labeling via structural alignment. Computational Linguistics, 38(1): 135–171, 2012.
- ↑ Jonathan Berant, Ido Dagan, and Jacob Goldberger. Learning entailment relations by global graph structure optimization. Computational Linguis- tics, 38(1):73–111, 2012.
- ↑ Mihai Surdeanu, Massimiliano Ciaramita, and Hugo Zaragoza. Learning to rank answers to non-factoid questions from web collections. Computational Linguistics, 37(2):351–383, 2011.