Natural Language Understanding Wiki

Old overview: Caruana (1997)[1]

Yang et al. (2016)[2]:

"We present a deep hierarchical recurrent neural network for sequence tagging. Given a sequence of words, our model employs deep gated recurrent units on both character and word levels to encode morphology and context information, and applies a conditional random field layer to predict the tags. Our model is task independent, language independent, and feature engineering free. We further extend our model to multi-task and crosslingual joint training by sharing the architecture and parameters."

Diagram from Yang et al. (2016)


  1. R Caruana. 1997. Multitask learning. Machine Learning, 28:41–75. 
  2. Yang, Salakhutdinov & Cohen. 2016. Multi-Task Cross-Lingual Sequence Tagging from Scratch. PDF