313 Pages

## (Generalized) siamese architecture Edit

Bordes et al. (2011) generalized siamese architecture to learn relations in knowledge bases. This approach is also called distance model. The main problem with this model is that the parameters of the two entity vectors do not interact with each other, they are independently mapped to a common space.

## Single Layer Model Edit

The second model tries to alleviate the problems of the distance model by connecting the entity vectors implicitly through the nonlinearity of a standard, single layer neural network. The scoring function has the following form:

$g(e_1, R, e_2) = u^\intercal_Rf (W_{R,1}e_1 + W_{R,2} e_2) = u^\intercal_R f \left([W_{R,1}W_{R,2}]\begin{bmatrix}e_1\\e_2\end{bmatrix}\right),$

where $f = \tanh, W_{R,1},W_{R,2} \in R^{k \times d}$ and $u_R \in R^{k \times 1}$ are the parameters of relation R’s scoring function. While this is an improvement over the distance model, the non-linearity only provides a
weak interaction between the two entity vectors at the expense of a harder optimization problem. Collobert andWeston  trained a similar model to learn word vector representations using words in their context. This model is a special case of the tensor neural network if the tensor is set to 0.

## Hadamard Model Edit

This model was introduced by Bordes et al.  and tackles the issue of weak entity vector interaction through multiple matrix products followed by Hadamard products. It is different to the other models in our comparison in that it represents each relation simply as a single vector that interacts with the entity vectors through several linear products all of which are parame- terized by the same parameters. The scoring function is as follows:

$g(e_1,R, e_2) = (W_1 e_1 \otimes W_{rel,1} e_R + b_1)^\intercal (W_2 e_2 \otimes W_{rel,2} e_R + b_2)$

where $W_1, W_{rel,1}, W_2,W_{rel,2} \in R^{d \times d}$ and $b_1, b_2 \in R^{d \times 1}$ are parameters that are shared by all relations. The only relation specific parameter is $e_R$. While this allows the model to treat relational words and entity words the same way, we show in our experiments that giving each relationship its own matrix operators results in improved performance. However, the bilinear form between entity vectors is by itself desirable.

## Bilinear Model Edit

Jenatton et al. (2012), Sutskever et al. (2009) fixes the issue of weak entity vector interaction through a relation-specific bilinear form. The scoring function is as follows:

$g(e_1, R, e_2) = e^\intercal_1 W_R e_2,$

where $W_R \in R^{d \times d}$ are the only parameters of relation R’s scoring function. This is a big improvement over the two previous models as it incorporates the interaction of two entity vectors in a simple and efficient way. However, the model is now restricted in terms of expressive power and number of parameters by the word vectors. The bilinear form can only model linear interactions and is not able to fit more complex scoring functions. This model is a special case of neural tensor network with $V_R = 0, b_R = 0, k = 1, f =$ identity. In comparison to bilinear models, the neural tensor has much more expressive power which will be useful especially for larger databases. For smaller datasets the number of slices could be reduced or even vary between relations.

## Neural tensor network Edit

Socher et al. (2013)

## Edit

Community content is available under CC-BY-SA unless otherwise noted.