TODO: Nickel et al. (2016)^{[1]} "The basis for the Semantic Web is Description Logic and [109, 110, 111] describe approaches for logic-oriented machine learning approaches in this context. Also to mention are data mining approaches for knowledge graphs as described in [112, 113, 114]."

## Extracting rules from a neural network Edit

Sushil et al. (2018)^{[2]} experimented with extracting rules from a small neural network trained on the 20 newsgroups dataset. The extracted rule set achieve a macro-average F1 of 0.8 ^{why not micro average?} (the source neural net gets 0.82). They open the source code and analyze examples.

## Statistical approaches Edit

From Yang et al. (2015)^{[3]}: "The key problem of extracting Horn rules like the aforementioned example is how to effectively explore the search space. Traditional rule mining approaches directly operate on the KB graph – they search for possible rules (i.e. closed-paths in the graph) by pruning rules with low statistical significance and relevance (Schoenmackers et al., 2010). These approaches often fail on large KB graphs due to scalability issues."

## Embedding-based approaches Edit

### Extracting Horn rules Edit

Yang et al. (2015)^{[3]} use embeddings to extract Horn rules of length 2: $ B_1(a,b) \wedge B_2(b,c) \Rightarrow H(a,c) $. For a pair of relations, they create a composed representation (adding the two relation vectors or multiplying if they are
matrices). The composed representation should be "similar" to the representation of *H* (Euclidean distance for vectors and Frobenius norm for matrices). The algorithm only needs to iterate (a subset of) relation combinations instead of nodes in the graph therefore it runs much faster than traditional statistical approaches. Yang et al. also demonstrated that it is more accurate than AMIE (Galárraga et al., 2013^{[4]}).

## Evaluation Edit

From Yang et al. (2015)^{[3]}: "We consider precision as the evaluation metric, which is the ratio of predictions that are in the test (unseen) data to all the generated unseen predictions. Note that this is an estimation, since a prediction is not necessarily “incorrect” if it is not seen. Gal´
arraga et al. (2013) suggested to
identify incorrect predictions based on the functional property of relations. However, we find that most relations in our datasets are not functional."

## References Edit

- ↑ Nickel, M., Murphy, K., Tresp, V., & Gabrilovich, E. (2016). A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1), 11–33. doi:10.1109/JPROC.2015.2483592
- ↑ Sushil, M., Šuster, S., & Daelemans, W. (2018). Rule induction for global explanation of trained models. Retrieved from http://arxiv.org/abs/1808.09744
- ↑
^{3.0}^{3.1}^{3.2}Yang, B., Yih, W., He, X., Gao, J., & Deng, L. (2015). Embedding Entities and Relations for Learning and Inference in Knowledge Bases. ICLR 2015, 12. Retrieved from http://arxiv.org/abs/1412.6575 - ↑ Galárraga, L. A., Preda, N., & Suchanek, F. M. (2013). Mining Rules to Align Knowledge Bases. Proceedings of the 2013 Workshop on Automated Knowledge Base Construction, 43–48. doi:10.1145/2509558.2509566