Natural Language Understanding Wiki

Dropout is an important technique to improve generalization of neural networks. As Semeniuta et al. (2016)[1] have stated, "since its introduction, it has become, together with the L2 weight decay, the de-facto standard neural network regularization method."

Usage: "For the RNN alone, before recurrent connections seems to be a good choice. Yet we observed, especially for complete recognition systems, that applying dropout on the classification features, i.e. after the last LSTM, is crucial to observe WER improvements, without much degradation of the performance of the RNN alone." (Bluche et al., 2015) [2]

Dropout for logistic regression[]

Dropout prevents co-adaptation, i.e. ensures that features make independent contribution, therefore improves the performance of logistic regression on small datasets.


Dropout in RNN[]

Moon et al. (2015)[3] proposed to dropout the hidden state of recurrent units but this causes the units to forget all their information. They proposed to use per-sequence dropout mask to alleviate this problem. Another problem occurs at test time when we need to multiply the hidden state with the probability of not dropping a unit. The probability is multiplied many times, limiting the network's ability to remember its past. These issues apply to LSTM and GRU but not vanilla RNN.

Semeniuta et al. (2016)[1] propose to apply dropout to hidden state updates instead. They demonstrate efficiency in language modeling, named entity recognition and sentiment analysis.


  1. 1.0 1.1 Semeniuta, S., Severyn, A., & Barth, E. (2016). Recurrent Dropout without Memory Loss. Retrieved from Source code:
  2. Theodore Bluche, Christopher Kermorvant, and J´erˆome Louradour. 2015. Where to apply dropout in recurrent neural networks for handwriting recognition? In 13th International Conference on Document Analysis and Recognition, ICDAR 2015, Tunis, Tunisia, August 23-26, 2015, pages 681–685.
  3. Taesup Moon, Heeyoul Choi, Hoshik Lee, and Inchul Song. 2015. Rnndrop: A novel dropout for rnns in asr. Automatic Speech Recognition and Understanding (ASRU).