A neural network with a maxout layer can be viewed as automatically learning and choosing multiple senses of a word. Devlin et al. (2015) made use of such architecture (among others) but didn't mention word senses.
- Devlin, Jacob and Quirk, Chris and Menezes, Arul (2015): Pre-Computable Multi-Layer Neural Network Language Models, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. PDF