Main idea: try to allocate similar vectors to similar words from the start (see Goldberg (2015)[1], Sections 5.1-5.3).
Chen and Manning (2014)[2]: Figure 4 (middle) shows that using pre-trained word embeddings can obtain around 0.7% improvement on PTB and 1.7% improvement on CTB, compared with using random initialization within (−0.01, 0.01)."
Lebret et al. (2015)[3]: "This task confirms the importance of embedding fine-tuning for NLP tasks with a high semantic component. We note that our tuned embeddings leads to a performance gain of about 1% or 2% for NER, while the gain is between about 4% and 8% for the movie review."
Pei et al. (2014)[4]: "Previous work found that the performance can be improved by pre-training the character embeddings on large unlabeled data and using the obtained embeddings to initialize the character lookup table instead of random initialization (Mansur et al., 2013; Zheng et al., 2013). [...] We pre-train the embeddings on the Chinese Giga-word corpus (Graff and Chen, 2005). As shown in Table 5 (last three rows), both the F-score and OOV recall of our model boost by using pre-training."
See also[]
Re-embedding words (Labutov and Lipson, 2013)[5]: initial ideas when people were starting to explore word embeddings, not used often later.
References[]
- ↑ Goldberg, Y. (2015). A Primer on Neural Network Models for Natural Language Processing, 1–76.
- ↑ Chen, Danqi, and Christopher D. Manning. "A Fast and Accurate Dependency Parser using Neural Networks." EMNLP. 2014.
- ↑ Lebret, Rémi, Joël Legrand, and Ronan Collobert. Is deep learning really necessary for word embeddings?. No. EPFL-REPORT-196986. Idiap, 2013.
- ↑ Pei, Wenzhe, Tao Ge, and Baobao Chang. "Max-Margin Tensor Neural Network for Chinese Word Segmentation." ACL (1). 2014.
- ↑ Labutov, Igor, and Hod Lipson. "Re-embedding words." ACL (2). 2013. PDF