Given a family of functions $ G_W $ parameterized by W, we seek to find a value of the parameter such that the similarity metric $ E_W(X_1, X_2)=||G_W(X_1) - G_W(X_2)||_p $is small if $ X_1 $ and $ X_2 $ are from the same category, and large if they belong to different categories. The system is trained on pairs of patterns taken from a training set. The loss function minimized by training minimizes $ E_W(X_1, X_2) $ when $ X_1 $ and $ X_2 $ are from the same category, and maximizes $ E_W(X_1, X_2) $ when they belong to different categories. No assumption is made about the nature of other than differentiability with respect to W. Because the same function with the same parameter is used to process both inputs, the similarity metric is symmetric.
Bordes et al. (2011) devised an architecture to learn more than one asymmetric relations at the same time.
- ↑ Bordes, A., Weston, J., Collobert, R., & Bengio, Y. (2011, April). Learning Structured Embeddings of Knowledge Bases. In AAAI.