Attention mechanism

Attention mechanism was initially invented for machine translation but quickly found applications in many other tasks. It works whenever one needs to "translate" from one structure (images, sequences, trees) to another.

The basic idea is to read the input structure twice: once to encode the gist and another time (at each step while decoding) to "pay attention" to certain details.

Machine translation

TODO: Luong et al. (2015)^[1]

Text processing/understanding

Natural language inference: Parikh et al. (2016)^[2]

Visual

Mnih, V., Heess, N., Graves, A., & Kavukcuoglu, K. (2014). Recurrent Models of Visual Attention, 1–12. Retrieved from http://arxiv.org/abs/1406.6247

Ba, J., Mnih, V., & Kavukcuoglu, K. (2014). Multiple Object Recognition with Visual Attention. arXiv Preprint arXiv:1412.7755.

Audio

http://arxiv.org/pdf/1508.01211.pdf

References

↑ Luong, M.-T., Pham, H., & Manning, C. D. (2015). Effective Approaches to Attention-based Neural Machine Translation. Emnlp, (September), 11. Retrieved from http://arxiv.org/abs/1508.04025
↑ Parikh, A. P., Täckström, O., Das, D., & Uszkoreit, J. (2016). A Decomposable Attention Model for Natural Language Inference. Retrieved from http://arxiv.org/abs/1606.01933

[1] Luong, M.-T., Pham, H., & Manning, C. D. (2015). Effective Approaches to Attention-based Neural Machine Translation. Emnlp, (September), 11. Retrieved from http://arxiv.org/abs/1508.04025

[2] Parikh, A. P., Täckström, O., Das, D., & Uszkoreit, J. (2016). A Decomposable Attention Model for Natural Language Inference. Retrieved from http://arxiv.org/abs/1606.01933

[1]

[2]