Given a probability function over atoms (e.g. using matrix factorization), Rocktäschel et al. (2015)^{[1]} pointed out that we can define the probability of virtually any first-order logic formulae:

- Atom: [A], [B]
- And: [A∧B] = [A][B] (assuming A and B are independent, if A and B share one constant then this is clearly violated, there isn't a solution for such cases yet)
- Or: [A∨B] = [A]+[B]−[A][B]
- Implication: [A⇒B] = [A]([B]−1)+1
- etc.

Therefore, if the probability of atoms is defined using a differentiable function, the probability of any formula is generally differentiable too.

Gradient-based optimization methods can be applied to learn embeddings jointly from atoms and other logic formulae.

## References[]

- ↑ Rocktäschel, T., Singh, S., & Riedel, S. (2015). Injecting Logical Background Knowledge into Embeddings for Relation Extraction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1119–1129). Association for Computational Linguistics.