FANDOM


MIXER which stands for Mixed Incremental Cross-Entropy REINFORCE was proposed by Ranzato et al. (2016)[1] for language modeling. It borrows some ideas from Learning to Search (LoS) algorithms but is much simpler. Similar to LoS, it starts training with optimal (gold) policy and gradually moving towards the model's actual policy. Different from LoS, it doesn't tinker with the policy itself or the generation of action sequence but simply alternates training with gold policy and actual policy.

References Edit

  1. Ranzato, M., Chopra, S., Auli, M., & Zaremba, W. (2016). Sequence Level Training with Recurrent Neural Networks. ICLR, 1–15. Retrieved from http://arxiv.org/abs/1511.06732