(→Limitations: cf.) Tag: Visual edit |
m (→Can RNN model hierarchy?: ,,,) Tag: Visual edit |
||
(2 intermediate revisions by the same user not shown) | |||
Line 16: | Line 16: | ||
== Limitations == |
== Limitations == |
||
* cannot model reduplication: Prickett (2017)<ref>Prickett, Brandon. "Vanilla Sequence-to-Sequence Neural Nets cannot Model Reduplication." (2017).</ref> though it's possible with special treatment (Gu et al. 2016<ref>Gu, Jiatao, et al. "Incorporating copying mechanism in sequence-to-sequence learning." ''arXiv preprint arXiv:1603.06393'' (2016).</ref> and Alhama, 2017<ref>Garrido Alhama, R. "Computational modelling of Artificial Language Learning." (2017).</ref>) |
* cannot model reduplication: Prickett (2017)<ref>Prickett, Brandon. "Vanilla Sequence-to-Sequence Neural Nets cannot Model Reduplication." (2017).</ref> though it's possible with special treatment (Gu et al. 2016<ref>Gu, Jiatao, et al. "Incorporating copying mechanism in sequence-to-sequence learning." ''arXiv preprint arXiv:1603.06393'' (2016).</ref> and Alhama, 2017<ref>Garrido Alhama, R. "Computational modelling of Artificial Language Learning." (2017).</ref>) |
||
+ | |||
+ | == Can RNN model hierarchy? == |
||
+ | From Gulordava et al. (2018): <blockquote>"Linzen et al. (2016) directly evaluated the ex- tent to which RNNs can approximate hierarchi- cal structure in corpus-extracted natural language data [...] |
||
+ | |||
+ | Bernardy and Lappin (2017) observed that RNNs are better at long-distance agreement when they construct rich lexical representations of words [...] |
||
+ | |||
+ | Early work showed that RNNs can, to a certain degree, handle data generated by context-free and even context-sensitive grammars (e.g., Elman, 1991, 1993; Rohde and Plaut, 1997; Christiansen and Chater, 1999; Gers and Schmidhuber, 2001; Cartling, 2008). [...]</blockquote><blockquote>We tentatively conclude that LM-trained RNNs can construct abstract grammatical representations of their input. This, in turn, suggests that the input itself contains enough information to trigger some form of syntactic learning in a system, such as an RNN, that does not contain an explicit prior bias in favour of syntactic structures."</blockquote> |
||
+ | |||
+ | == Glossary == |
||
+ | * highway connections (Srivastava et al., 2015)<ref>Rupesh K Srivastava, Klaus Greff, and Jürgen Schmidhuber. Training very deep networks. In |
||
+ | Advances in neural information processing systems, pp. 2377–2385, 2015. |
||
+ | </ref> |
||
+ | * SRU = Simple Recurrent Unit (Lei et al. 2017)<ref>Lei, T., Zhang, Y., & Artzi, Y. (2017). Training RNNs as Fast as CNNs. Retrieved from http://arxiv.org/abs/1709.02755</ref> |
||
+ | * QRNN = quasi-recurrent neural network (Bradbury et al. 2016)<ref>Bradbury, James, Stephen Merity, Caiming Xiong, and Richard Socher. "Quasi-recurrent neural networks." ''arXiv preprint arXiv:1611.01576'' (2016).</ref> |
||
== References == |
== References == |
Latest revision as of 23:15, 8 April 2018
Training[]
Problem: gradient vanishing or exploding.
Long Short-Term Memory[]
Structurally constrained network[]
Mikolov et al. (2015)[1] combine feed-forward NN with a cache model.
Rectified units with initialization trick[]
Le et al. (2015)[2] uses rectified units with identity matrix or its scaled-down versions as recurrent matrices.
Limitations[]
- cannot model reduplication: Prickett (2017)[3] though it's possible with special treatment (Gu et al. 2016[4] and Alhama, 2017[5])
Can RNN model hierarchy?[]
From Gulordava et al. (2018):
"Linzen et al. (2016) directly evaluated the ex- tent to which RNNs can approximate hierarchi- cal structure in corpus-extracted natural language data [...]
Bernardy and Lappin (2017) observed that RNNs are better at long-distance agreement when they construct rich lexical representations of words [...]
Early work showed that RNNs can, to a certain degree, handle data generated by context-free and even context-sensitive grammars (e.g., Elman, 1991, 1993; Rohde and Plaut, 1997; Christiansen and Chater, 1999; Gers and Schmidhuber, 2001; Cartling, 2008). [...]
We tentatively conclude that LM-trained RNNs can construct abstract grammatical representations of their input. This, in turn, suggests that the input itself contains enough information to trigger some form of syntactic learning in a system, such as an RNN, that does not contain an explicit prior bias in favour of syntactic structures."
Glossary[]
- highway connections (Srivastava et al., 2015)[6]
- SRU = Simple Recurrent Unit (Lei et al. 2017)[7]
- QRNN = quasi-recurrent neural network (Bradbury et al. 2016)[8]
References[]
- ↑ Mikolov, T., Joulin, A., Chopra, S., Mathieu, M., & Ranzato, M. A. (2014). Learning Longer Memory in Recurrent Neural Networks. arXiv preprint arXiv:1412.7753.
- ↑ Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton, 2015. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. URL
- ↑ Prickett, Brandon. "Vanilla Sequence-to-Sequence Neural Nets cannot Model Reduplication." (2017).
- ↑ Gu, Jiatao, et al. "Incorporating copying mechanism in sequence-to-sequence learning." arXiv preprint arXiv:1603.06393 (2016).
- ↑ Garrido Alhama, R. "Computational modelling of Artificial Language Learning." (2017).
- ↑ Rupesh K Srivastava, Klaus Greff, and Jürgen Schmidhuber. Training very deep networks. In Advances in neural information processing systems, pp. 2377–2385, 2015.
- ↑ Lei, T., Zhang, Y., & Artzi, Y. (2017). Training RNNs as Fast as CNNs. Retrieved from http://arxiv.org/abs/1709.02755
- ↑ Bradbury, James, Stephen Merity, Caiming Xiong, and Richard Socher. "Quasi-recurrent neural networks." arXiv preprint arXiv:1611.01576 (2016).