Natural Language Understanding Wiki
Tag: Visual edit
Tag: Visual edit
 
(2 intermediate revisions by the same user not shown)
Line 16: Line 16:
 
== Limitations ==
 
== Limitations ==
 
* cannot model reduplication: Prickett (2017)<ref>Prickett, Brandon. "Vanilla Sequence-to-Sequence Neural Nets cannot Model Reduplication." (2017).</ref> though it's possible with special treatment (Gu et al. 2016<ref>Gu, Jiatao, et al. "Incorporating copying mechanism in sequence-to-sequence learning." ''arXiv preprint arXiv:1603.06393'' (2016).</ref> and Alhama, 2017<ref>Garrido Alhama, R. "Computational modelling of Artificial Language Learning." (2017).</ref>)
 
* cannot model reduplication: Prickett (2017)<ref>Prickett, Brandon. "Vanilla Sequence-to-Sequence Neural Nets cannot Model Reduplication." (2017).</ref> though it's possible with special treatment (Gu et al. 2016<ref>Gu, Jiatao, et al. "Incorporating copying mechanism in sequence-to-sequence learning." ''arXiv preprint arXiv:1603.06393'' (2016).</ref> and Alhama, 2017<ref>Garrido Alhama, R. "Computational modelling of Artificial Language Learning." (2017).</ref>)
  +
  +
== Can RNN model hierarchy? ==
  +
From Gulordava et al. (2018): <blockquote>"Linzen et al. (2016) directly evaluated the ex- tent to which RNNs can approximate hierarchi- cal structure in corpus-extracted natural language data [...]
  +
  +
Bernardy and Lappin (2017) observed that RNNs are better at long-distance agreement when they construct rich lexical representations of words [...]
  +
  +
Early work showed that RNNs can, to a certain degree, handle data generated by context-free and even context-sensitive grammars (e.g., Elman, 1991, 1993; Rohde and Plaut, 1997; Christiansen and Chater, 1999; Gers and Schmidhuber, 2001; Cartling, 2008). [...]</blockquote><blockquote>We tentatively conclude that LM-trained RNNs can construct abstract grammatical representations of their input. This, in turn, suggests that the input itself contains enough information to trigger some form of syntactic learning in a system, such as an RNN, that does not contain an explicit prior bias in favour of syntactic structures."</blockquote>
  +
  +
== Glossary ==
  +
* highway connections (Srivastava et al., 2015)<ref>Rupesh K Srivastava, Klaus Greff, and Jürgen Schmidhuber. Training very deep networks. In
  +
Advances in neural information processing systems, pp. 2377–2385, 2015.
  +
</ref>
  +
* SRU = Simple Recurrent Unit (Lei et al. 2017)<ref>Lei, T., Zhang, Y., & Artzi, Y. (2017). Training RNNs as Fast as CNNs. Retrieved from http://arxiv.org/abs/1709.02755</ref>
  +
* QRNN = quasi-recurrent neural network (Bradbury et al. 2016)<ref>Bradbury, James, Stephen Merity, Caiming Xiong, and Richard Socher. "Quasi-recurrent neural networks." ''arXiv preprint arXiv:1611.01576'' (2016).</ref>
   
 
== References ==
 
== References ==

Latest revision as of 23:15, 8 April 2018

Training[]

Problem: gradient vanishing or exploding.

Long Short-Term Memory[]

Structurally constrained network[]

Mikolov et al. (2015)[1] combine feed-forward NN with a cache model.

Rectified units with initialization trick[]

Le et al. (2015)[2] uses rectified units with identity matrix or its scaled-down versions as recurrent matrices.

Limitations[]

  • cannot model reduplication: Prickett (2017)[3] though it's possible with special treatment (Gu et al. 2016[4] and Alhama, 2017[5])

Can RNN model hierarchy?[]

From Gulordava et al. (2018):

"Linzen et al. (2016) directly evaluated the ex- tent to which RNNs can approximate hierarchi- cal structure in corpus-extracted natural language data [...]

Bernardy and Lappin (2017) observed that RNNs are better at long-distance agreement when they construct rich lexical representations of words [...]

Early work showed that RNNs can, to a certain degree, handle data generated by context-free and even context-sensitive grammars (e.g., Elman, 1991, 1993; Rohde and Plaut, 1997; Christiansen and Chater, 1999; Gers and Schmidhuber, 2001; Cartling, 2008). [...]

We tentatively conclude that LM-trained RNNs can construct abstract grammatical representations of their input. This, in turn, suggests that the input itself contains enough information to trigger some form of syntactic learning in a system, such as an RNN, that does not contain an explicit prior bias in favour of syntactic structures."

Glossary[]

  • highway connections (Srivastava et al., 2015)[6]
  • SRU = Simple Recurrent Unit (Lei et al. 2017)[7]
  • QRNN = quasi-recurrent neural network (Bradbury et al. 2016)[8]

References[]

  1. Mikolov, T., Joulin, A., Chopra, S., Mathieu, M., & Ranzato, M. A. (2014). Learning Longer Memory in Recurrent Neural Networks. arXiv preprint arXiv:1412.7753.
  2. Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton, 2015. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. URL
  3. Prickett, Brandon. "Vanilla Sequence-to-Sequence Neural Nets cannot Model Reduplication." (2017).
  4. Gu, Jiatao, et al. "Incorporating copying mechanism in sequence-to-sequence learning." arXiv preprint arXiv:1603.06393 (2016).
  5. Garrido Alhama, R. "Computational modelling of Artificial Language Learning." (2017).
  6. Rupesh K Srivastava, Klaus Greff, and Jürgen Schmidhuber. Training very deep networks. In Advances in neural information processing systems, pp. 2377–2385, 2015.
  7. Lei, T., Zhang, Y., & Artzi, Y. (2017). Training RNNs as Fast as CNNs. Retrieved from http://arxiv.org/abs/1709.02755
  8. Bradbury, James, Stephen Merity, Caiming Xiong, and Richard Socher. "Quasi-recurrent neural networks." arXiv preprint arXiv:1611.01576 (2016).