Problem: gradient vanishing or exploding.
Long Short-Term Memory Edit
Structurally constrained network Edit
Mikolov et al. (2015) combine feed-forward NN with a cache model.
Rectified units with initialization trick Edit
Le et al. (2015) uses rectified units with identity matrix or its scaled-down versions as recurrent matrices.
- cannot model reduplication: Prickett (2017) though it's possible with special treatment (Gu et al. 2016 and Alhama, 2017)
Can RNN model hierarchy? Edit
From Gulordava et al. (2018):
"Linzen et al. (2016) directly evaluated the ex- tent to which RNNs can approximate hierarchi- cal structure in corpus-extracted natural language data [...]
Bernardy and Lappin (2017) observed that RNNs are better at long-distance agreement when they construct rich lexical representations of words [...]
Early work showed that RNNs can, to a certain degree, handle data generated by context-free and even context-sensitive grammars (e.g., Elman, 1991, 1993; Rohde and Plaut, 1997; Christiansen and Chater, 1999; Gers and Schmidhuber, 2001; Cartling, 2008). [...]
We tentatively conclude that LM-trained RNNs can construct abstract grammatical representations of their input. This, in turn, suggests that the input itself contains enough information to trigger some form of syntactic learning in a system, such as an RNN, that does not contain an explicit prior bias in favour of syntactic structures."
- highway connections (Srivastava et al., 2015)
- SRU = Simple Recurrent Unit (Lei et al. 2017)
- QRNN = quasi-recurrent neural network (Bradbury et al. 2016)
- ↑ Mikolov, T., Joulin, A., Chopra, S., Mathieu, M., & Ranzato, M. A. (2014). Learning Longer Memory in Recurrent Neural Networks. arXiv preprint arXiv:1412.7753.
- ↑ Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton, 2015. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. URL
- ↑ Prickett, Brandon. "Vanilla Sequence-to-Sequence Neural Nets cannot Model Reduplication." (2017).
- ↑ Gu, Jiatao, et al. "Incorporating copying mechanism in sequence-to-sequence learning." arXiv preprint arXiv:1603.06393 (2016).
- ↑ Garrido Alhama, R. "Computational modelling of Artificial Language Learning." (2017).
- ↑ Rupesh K Srivastava, Klaus Greff, and Jürgen Schmidhuber. Training very deep networks. In Advances in neural information processing systems, pp. 2377–2385, 2015.
- ↑ Lei, T., Zhang, Y., & Artzi, Y. (2017). Training RNNs as Fast as CNNs. Retrieved from http://arxiv.org/abs/1709.02755
- ↑ Bradbury, James, Stephen Merity, Caiming Xiong, and Richard Socher. "Quasi-recurrent neural networks." arXiv preprint arXiv:1611.01576 (2016).