## FANDOM

313 Pages

### Knowledge base Edit

The key idea is that good embeddings to predict triples in a knowledge base are those that cluster similar synsets together. By training a model on knowledge base completion task, researchers hope to find good synset embeddings as byproduct.

Given a triple, e.g. (cat, has_part, tail), let l denotes the left synset, r is the right synset and t is the relation between them.

In the literature, there are two ways to formalize this task:

• Margin-based: A scoring function g should score observed triples greater than random triples at least to some margin, hence: maximize $L+R$ where in L, left entities are replaced by random entities $L = \sum min(g(l, t, r) - g(l', t, r) - margin, 0)$ while in R, right entities are randomized $R = \sum min(g(l, t, r) - g(l, t, r') - margin, 0)$.
• Negative sampling: A function f is trained to differentiate between a triple drawn from correct distribution D and one drawn from a uniform distribution N. For example, $f(\cdot)$ stands for the probability that a triple comes from D then we minimize negative log-likelihood: $\sum_{D} -\log f(l, t, r) + \sum_{N} -\log (1 - f(l', t', r'))$.

Many models have been proposed (see Yang et al., 2014 for a review):

1. Unstructured: Treat all relations indifferently $g(l, t, r) = |l-r|_p$
2. RESCAL: TODO
3. SE: A relation is represented by two matrices, working as linear transformation on each side: $g(l, t, r) = |l.t_L - r.t_R|_p$
4. SME(LINEAR): A relation is represented by two vectors, the semantic matching energy function compare two sides: $g(l, t, r) = (w_{L1}.l + w_{L2}.t_L + b_l)(w_{R1}.r + w_{R2}.t_R + b_R)$
5. SME(BILINEAR): The representation of relations stays the same but weights are rank 3 tensors: $g(l, t, r) = ((w_L \times t_L).l)((w_R \times t_R).r)$
6. LFM: TODO
7. TransE: A relation is a translation of the left hand side to the right hand side: $g(l,t,r) = |l+t-r|_p$
8. TransM: Scale scores according to its relation: $g(l,t,r) = w_T|l+t-r|_p$
9. Neural tensor network: Relations are represented as rank-3 tensors, score consists of two parts: "tensor" part is the tensor product of them with a relation and "neural" part adds up linear combination of entities: $g(l, t, r)=l.t.r + w_L.l + w_R.r$

### Knowledge base + Text Edit

Wang et al. (2014) created two models for entities and words and align them by Wikipedia anchors or the name of entities.

Bordes et al. (2012) trains embeddings on knowledge base completion and word sense disambiguation tasks simultaneously therefore make use of both knowledge bases and corpora.

## Edit

Community content is available under CC-BY-SA unless otherwise noted.