MEN[1] is a benchmark for similarity/relatedness.

Model Year Source of info. Spearman's $ \rho $ Paper Notes
CNN-Mean 2014 Text+Image 0.70 Kiela & Bottou, 2014[2] ImageNet visual features
CNN-Mean 2014 Text+Image 0.71 Kiela & Bottou, 2014[2] ESP game visual features
W2-SIFT 2012 Text+Image 0.69 Bruni et al.[1] SIFT features

References Edit

  1. 1.0 1.1 Bruni, E., Boleda, G., Baroni, M., & Tran, N. K. (2012, July). Distributional semantics in technicolor. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1 (pp. 136-145). Association for Computational Linguistics.
  2. 2.0 2.1 Kiela, D., & Bottou, L. (2014). Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 36-45).
Community content is available under CC-BY-SA unless otherwise noted.