With the wave of deep learning, researchers paid more and more attention to distributed representation. Although successful in many tasks, it has always been know that this approach has serious drawbacks that are strength of logic such as compositionality. Therefore the interest in combining them has also raised significantly.

We may frame this line of research in a larger topic combining symbolic and sub-symbolic approaches which was fashionable during 1980s-1990s (e.g. Hilton, 1986^{[2]}; Ultsch, 1994^{[3]}, 1995^{[4]}). However the aim of recent research has contracted and terminology has been much distilled.

Different models have been proposed to solve different specific tasks such as knowledge base completion (Socher et al. 2013^{[5]}), small-scale reasoning (Rocktäschel 2014^{[6]}).

## Approaches

### Direct mapping

Herbelot & Vecchi (2105):
"We predict that there is a functional relationship between distributional information and vectorial concept representations in which dimen- sions are predicates and weights are generalised quantifiers."

### Proposition completion

Hilton (1986)^{[2]} had his neural network learn two family trees and got interesting representations of family members as a by product. The trees were turned in to 104 propositions (*person1*, *relation*, *person2*) of which 100 were used for training. For each proposition, the neural network was given fillers of two first roles and asked to predict that of the third.

As of 2014, the paper was cited more than 500 times. The approach seems restricted regarding application and scalability.

Paccanaro & Hilton (2000)^{[8]} proposed linear relational embedding which is somewhat simpler. Their later paper^{[9]} extended the model to handle special cases where there is no answer or there are multiple answers.

### Relation predicting

Bowman (2014)^{[10]} employed a neural network with one hidden layer and one softmax layer to predict the relation (one of entailment, reverse entailment, equivalent, alternation, negation, cover, and independent) between two phrases.

### Relation classification

### Probabilistic inference informed by distributional similarity

Beltagy et al. (2013)^{[11]} performed textual entailment recognization and semantic textual similarity by casting them as probabilistic entailment in Markov logic. For example, the similarity between two sentences:

*S*_{1}: A man is slicing a cucumber.

*S*_{2:} A man is slicing a zucchini.

is judged as judged as the average degree of mutual entailment ($ S_1 \models S_2 $ and $ S_2 \models S_1 $). Strictly speaking, *S*_{1} does not entail *S*_{2} and vice versa. The authors fixed this by adding the rule cucumber(x)→zucchini(x) | wt(cuc., zuc.) which literally means "if something is a cucumber, it is also a zucchini" (with inference cost=wt(...)). wt(.) is a function of the cosine similarity between two words.

