Natural Language Understanding Wiki

Knowledge base[]

The key idea is that good embeddings to predict triples in a knowledge base are those that cluster similar synsets together. By training a model on knowledge base completion task, researchers hope to find good synset embeddings as byproduct.

Given a triple, e.g. (cat, has_part, tail), let l denotes the left synset, r is the right synset and t is the relation between them.

In the literature, there are two ways to formalize this task:

  • Margin-based: A scoring function g should score observed triples greater than random triples at least to some margin, hence: maximize where in L, left entities are replaced by random entities while in R, right entities are randomized .
  • Negative sampling: A function f is trained to differentiate between a triple drawn from correct distribution D and one drawn from a uniform distribution N. For example, stands for the probability that a triple comes from D then we minimize negative log-likelihood: .

Many models have been proposed (see Yang et al., 2014[1] for a review):

  1. Unstructured[2]: Treat all relations indifferently
  2. RESCAL[3]: TODO
  3. SE[4]: A relation is represented by two matrices, working as linear transformation on each side:
  4. SME(LINEAR)[2]: A relation is represented by two vectors, the semantic matching energy function compare two sides:
  5. SME(BILINEAR)[2]: The representation of relations stays the same but weights are rank 3 tensors:
  6. LFM[5]: TODO
  7. TransE[6]: A relation is a translation of the left hand side to the right hand side:
  8. TransM[7]: Scale scores according to its relation:
  9. Neural tensor network[8]: Relations are represented as rank-3 tensors, score consists of two parts: "tensor" part is the tensor product of them with a relation and "neural" part adds up linear combination of entities:

Knowledge base + Text[]

Wang et al. (2014)[9] created two models for entities and words and align them by Wikipedia anchors or the name of entities.

Bordes et al. (2012)[10] trains embeddings on knowledge base completion and word sense disambiguation tasks simultaneously therefore make use of both knowledge bases and corpora.


  1. Yang, B., Yih, W., He, X., Gao, J., & Deng, L. (2014). Embedding Entities and Relations for Learning and Inference in Knowledge Bases, 12. Computation and Language. Retrieved from
  2. 2.0 2.1 2.2 A. Bordes, X. Glorot, J. Weston, and Y. Bengio. A semantic matching energy function for learning with multi-relational data. Machine Learning, 2013.
  3. M. Nickel, V. Tresp, and H.-P. Kriegel. A three-way model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on Machine Learning (ICML), 2011.
  4. A. Bordes, J.Weston, R. Collobert, and Y. Bengio. Learning structured embeddings of knowl- edge bases. In Proceedings of the 25th Annual Conference on Artificial Intelligence (AAAI), 2011.
  5. R. Jenatton, N. Le Roux, A. Bordes, G. Obozinski, et al. A latent factor model for highly multi-relational data. In Advances in Neural Information Processing Systems (NIPS 25), 2012.
  6. Bordes, A., Usunier, N., Weston, J., & Yakhnenko, O. (2013). Translating Embeddings for Modeling Multi-relational Data. In N (pp. 1–9).
  7. Miao Fan, Qiang Zhou, Emily Chang, Thomas Fang Zheng. Transition-based Knowledge Graph Embedding with Relational Mapping Properties. PACLIC'14
  8. R. Socher, D. Chen, C. D. Manning, and A. Y. Ng. Learning new facts from knowledge bases with neural tensor networks and semantic word vectors. In Advances in Neural Information Processing Systems (NIPS 26), 2013.
  9. Wang, Z., Zhang, J., Feng, J., & Chen, Z. (2014). Knowledge Graph and Text Jointly Embedding. In The 2014 Conference on Empirical Methods on Natural Language Processing. ACL – Association for Computational Linguistics. Retrieved from
  10. Bordes, A., & Weston, J. (2012). Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing, 22, 127–135.