Given a family of functions $ G_W $ parameterized by *W*, we seek to find a value of the parameter such that the similarity metric $ E_W(X_1, X_2)=||G_W(X_1) - G_W(X_2)||_p $is small if $ X_1 $ and $ X_2 $ are from the same category, and large if they belong to different categories.
The system is trained on pairs of patterns taken from a training set. The loss function minimized by training minimizes $ E_W(X_1, X_2) $ when $ X_1 $ and $ X_2 $ are from the same category,
and maximizes $ E_W(X_1, X_2) $
when they belong to different categories. No
assumption is made about the nature of other than
differentiability with respect to *W*. Because the same function with the same parameter is used to process both inputs, the similarity metric is symmetric.

## Generalization Edit

Bordes et al. (2011)^{[1]} devised an architecture to learn more than one asymmetric relations at the same time.

## References Edit

- ↑ Bordes, A., Weston, J., Collobert, R., & Bengio, Y. (2011, April). Learning Structured Embeddings of Knowledge Bases. In
*AAAI*.