TODO: a very interesting (and might have lasting impact), best long paper in NAACL 2016: Andreas et al. (2016)
With the wave of deep learning, researchers paid more and more attention to distributed representation. Although successful in many tasks, it has always been know that this approach has serious drawbacks that are strength of logic such as compositionality. Therefore the interest in combining them has also raised significantly.
We may frame this line of research in a larger topic combining symbolic and sub-symbolic approaches which was fashionable during 1980s-1990s (e.g. Hilton, 1986; Ultsch, 1994, 1995). However the aim of recent research has contracted and terminology has been much distilled.
Direct mapping Edit
Herbelot & Vecchi (2105): "We predict that there is a functional relationship between distributional information and vectorial concept representations in which dimen- sions are predicates and weights are generalised quantifiers."
Proposition completion Edit
Hilton (1986) had his neural network learn two family trees and got interesting representations of family members as a by product. The trees were turned in to 104 propositions (person1, relation, person2) of which 100 were used for training. For each proposition, the neural network was given fillers of two first roles and asked to predict that of the third.
As of 2014, the paper was cited more than 500 times. The approach seems restricted regarding application and scalability.
Paccanaro & Hilton (2000) proposed linear relational embedding which is somewhat simpler. Their later paper extended the model to handle special cases where there is no answer or there are multiple answers.
Relation predicting Edit
Bowman (2014) employed a neural network with one hidden layer and one softmax layer to predict the relation (one of entailment, reverse entailment, equivalent, alternation, negation, cover, and independent) between two phrases.
Relation classification Edit
- Main article: Models of relation classification
TODO: Socher et al. 2013
Probabilistic inference informed by distributional similarity Edit
Beltagy et al. (2013) performed textual entailment recognization and semantic textual similarity by casting them as probabilistic entailment in Markov logic. For example, the similarity between two sentences:
S1: A man is slicing a cucumber.
S2: A man is slicing a zucchini.
is judged as judged as the average degree of mutual entailment ($ S_1 \models S_2 $ and $ S_2 \models S_1 $). Strictly speaking, S1 does not entail S2 and vice versa. The authors fixed this by adding the rule cucumber(x)→zucchini(x) | wt(cuc., zuc.) which literally means "if something is a cucumber, it is also a zucchini" (with inference cost=wt(...)). wt(.) is a function of the cosine similarity between two words.
TODO: Further development: Beltagy et al. (2014).
- ↑ Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. (n.d.). Learning to Compose Neural Networks for Question Answering.
- ↑ 2.0 2.1 Hinton, G. E. (1986, August). Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society (Vol. 1, p. 12).
- ↑ Ultsch, A. (1994). The integration of neural networks with symbolic knowledge processing. In New Approaches in Classification and Data Analysis (pp. 445-454). Springer Berlin Heidelberg.
- ↑ Ultsch, A., & Korus, D. (1995, November). Integration of neural networks with knowledge-based systems. In Neural Networks, 1995. Proceedings., IEEE International Conference on (Vol. 4, pp. 1828-1833). IEEE.
- ↑ 5.0 5.1 Socher, R., Chen, D., Manning, C. D., & Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems (pp. 926-934).
- ↑ Rocktäschel, T., Bosnjak, M., Singh, S., & Riedel, S. Low-Dimensional Embeddings of Logic. ACL 2014 Workshop on Semantic Parsing.
- ↑ Aure ́lie Herbelot and Eva Maria Vecchi. 2015. Building a shared world: Mapping distributional to model-theoretic semantic spaces PDF
- ↑ Paccanaro, A., and Hinton, G.E. Learning Distributed Representations by Mapping Concepts and Relations into a Linear Space. ICML-2000, Proceedings of the Seventeenth International Conference on Machine Learning, Langley P. (Ed.), 711-718, Stanford University, Morgan Kaufmann Publishers, San Francisco.
- ↑ Paccanaro, A., & Hinton, G. E. (2000). Extracting distributed representations of concepts and relations from positive and negative propositions. In Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on (Vol. 2, pp. 259-264). IEEE.
- ↑ Samuel R Bowman. 2014. Can recursive neural tensor networks learn logical reasoning? In ICLR’14.
- ↑ Beltagy, I., Chau, C., Boleda, G., Garrette, D., & Erk, K. (2013). Montague Meets Markov : Deep Semantics with Probabilistic Logical Form. Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*Sem-2013), 11–21.
- ↑ Beltagy, I., Roller, S., Boleda, G., Erk, K., & Mooney, R. J. (2014). UTexas: Natural Language Semantics using Distributional Semantics and Probabilistic Logic. SemEval 2014, 796.