Changes: Recurrent neural networks

Latest revision as of 23:15, 8 April 2018

Training[]

Problem: gradient vanishing or exploding.

Long Short-Term Memory[]

Structurally constrained network[]

Mikolov et al. (2015)^[1] combine feed-forward NN with a cache model.

Rectified units with initialization trick[]

Le et al. (2015)^[2] uses rectified units with identity matrix or its scaled-down versions as recurrent matrices.

Limitations[]

cannot model reduplication: Prickett (2017)^[3] though it's possible with special treatment (Gu et al. 2016^[4] and Alhama, 2017^[5])

Can RNN model hierarchy?[]

From Gulordava et al. (2018):

"Linzen et al. (2016) directly evaluated the ex- tent to which RNNs can approximate hierarchi- cal structure in corpus-extracted natural language data [...]
Bernardy and Lappin (2017) observed that RNNs are better at long-distance agreement when they construct rich lexical representations of words [...]
Early work showed that RNNs can, to a certain degree, handle data generated by context-free and even context-sensitive grammars (e.g., Elman, 1991, 1993; Rohde and Plaut, 1997; Christiansen and Chater, 1999; Gers and Schmidhuber, 2001; Cartling, 2008). [...]

We tentatively conclude that LM-trained RNNs can construct abstract grammatical representations of their input. This, in turn, suggests that the input itself contains enough information to trigger some form of syntactic learning in a system, such as an RNN, that does not contain an explicit prior bias in favour of syntactic structures."

Glossary[]

highway connections (Srivastava et al., 2015)^[6]
SRU = Simple Recurrent Unit (Lei et al. 2017)^[7]
QRNN = quasi-recurrent neural network (Bradbury et al. 2016)^[8]

References[]

↑ Mikolov, T., Joulin, A., Chopra, S., Mathieu, M., & Ranzato, M. A. (2014). Learning Longer Memory in Recurrent Neural Networks. arXiv preprint arXiv:1412.7753.
↑ Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton, 2015. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. URL
↑ Prickett, Brandon. "Vanilla Sequence-to-Sequence Neural Nets cannot Model Reduplication." (2017).
↑ Gu, Jiatao, et al. "Incorporating copying mechanism in sequence-to-sequence learning." arXiv preprint arXiv:1603.06393 (2016).
↑ Garrido Alhama, R. "Computational modelling of Artificial Language Learning." (2017).
↑ Rupesh K Srivastava, Klaus Greff, and Jürgen Schmidhuber. Training very deep networks. In Advances in neural information processing systems, pp. 2377–2385, 2015.
↑ Lei, T., Zhang, Y., & Artzi, Y. (2017). Training RNNs as Fast as CNNs. Retrieved from http://arxiv.org/abs/1709.02755
↑ Bradbury, James, Stephen Merity, Caiming Xiong, and Richard Socher. "Quasi-recurrent neural networks." arXiv preprint arXiv:1611.01576 (2016).

[1] Mikolov, T., Joulin, A., Chopra, S., Mathieu, M., & Ranzato, M. A. (2014). Learning Longer Memory in Recurrent Neural Networks. arXiv preprint arXiv:1412.7753.

[2] Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton, 2015. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. URL

[3] Prickett, Brandon. "Vanilla Sequence-to-Sequence Neural Nets cannot Model Reduplication." (2017).

[4] Gu, Jiatao, et al. "Incorporating copying mechanism in sequence-to-sequence learning." arXiv preprint arXiv:1603.06393 (2016).

[5] Garrido Alhama, R. "Computational modelling of Artificial Language Learning." (2017).

[6] Rupesh K Srivastava, Klaus Greff, and Jürgen Schmidhuber. Training very deep networks. In Advances in neural information processing systems, pp. 2377–2385, 2015.

[7] Lei, T., Zhang, Y., & Artzi, Y. (2017). Training RNNs as Fast as CNNs. Retrieved from http://arxiv.org/abs/1709.02755

[8] Bradbury, James, Stephen Merity, Caiming Xiong, and Richard Socher. "Quasi-recurrent neural networks." arXiv preprint arXiv:1611.01576 (2016).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

@@ Line 16: / Line 16: @@
 == Limitations ==
 * cannot model reduplication: Prickett (2017)<ref>Prickett, Brandon. "Vanilla Sequence-to-Sequence Neural Nets cannot Model Reduplication." (2017).</ref> though it's possible with special treatment (Gu et al. 2016<ref>Gu, Jiatao, et al. "Incorporating copying mechanism in sequence-to-sequence learning." ''arXiv preprint arXiv:1603.06393'' (2016).</ref> and Alhama, 2017<ref>Garrido Alhama, R. "Computational modelling of Artificial Language Learning." (2017).</ref>)
+== Can RNN model hierarchy? ==
+From Gulordava et al. (2018): <blockquote>"Linzen et al. (2016) directly evaluated the ex- tent to which RNNs can approximate hierarchi- cal structure in corpus-extracted natural language data [...]
+Bernardy and Lappin (2017) observed that RNNs are better at long-distance agreement when they construct rich lexical representations of words [...]
+Early work showed that RNNs can, to a certain degree, handle data generated by context-free and even context-sensitive grammars (e.g., Elman, 1991, 1993; Rohde and Plaut, 1997; Christiansen and Chater, 1999; Gers and Schmidhuber, 2001; Cartling, 2008). [...]</blockquote><blockquote>We tentatively conclude that LM-trained RNNs can construct abstract grammatical representations of their input. This, in turn, suggests that the input itself contains enough information to trigger some form of syntactic learning in a system, such as an RNN, that does not contain an explicit prior bias in favour of syntactic structures."</blockquote>
+== Glossary ==
+* highway connections (Srivastava et al., 2015)<ref>Rupesh K Srivastava, Klaus Greff, and Jürgen Schmidhuber. Training very deep networks. In
+Advances in neural information processing systems, pp. 2377–2385, 2015.
+</ref>
+* SRU = Simple Recurrent Unit (Lei et al. 2017)<ref>Lei, T., Zhang, Y., & Artzi, Y. (2017). Training RNNs as Fast as CNNs. Retrieved from http://arxiv.org/abs/1709.02755</ref>
+* QRNN = quasi-recurrent neural network (Bradbury et al. 2016)<ref>Bradbury, James, Stephen Merity, Caiming Xiong, and Richard Socher. "Quasi-recurrent neural networks." ''arXiv preprint arXiv:1611.01576'' (2016).</ref>
 == References ==