Natural Language Understanding Wiki

Given a probability function over atoms (e.g. using matrix factorization), Rocktäschel et al. (2015)[1] pointed out that we can define the probability of virtually any first-order logic formulae:

  • Atom: [A], [B]
  • And: [A∧B] = [A][B] (assuming A and B are independent, if A and B share one constant then this is clearly violated, there isn't a solution for such cases yet)
  • Or: [A∨B] = [A]+[B]−[A][B]
  • Implication: [A⇒B] = [A]([B]−1)+1
  • etc.

Therefore, if the probability of atoms is defined using a differentiable function, the probability of any formula is generally differentiable too.

Gradient-based optimization methods can be applied to learn embeddings jointly from atoms and other logic formulae.


  1. Rocktäschel, T., Singh, S., & Riedel, S. (2015). Injecting Logical Background Knowledge into Embeddings for Relation Extraction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1119–1129). Association for Computational Linguistics.