Natural Language Understanding Wiki


  • Deterministic vs. randomized policy


  • Optimal Learning Trajectories: Some algorithms assume that the Optimal Learning Trajectories (OLTs) are known for all learning examples. An OLT is a sequence of actions that, given an input, leads from the initial state to the correct output.
  • Optimal Learning Policy: Some algorithms assume that for each learning example, we know an Optimal Learning Policy (OLP). The OLP is a procedure that knows the best decision to perform for any state of the prediction space.

Arguments for reinforcement learning[]

There's not enough training data for supervised learning to succeed in many tasks. See also Yoshua Bengio's argument for unsupervised learning[note 1].

Anthropomorphic argument (albeit a weak one): children learn from a small amount of "labeled" data. Humans of all age learn by trial-and-error, environment simulation,...


See also[]



  1. Note that he means non-standard unsupervised learning in which an agent can also interact with its environment.
  1. Tanner, B., & White, A. (2009). RL-Glue: Language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research, 10(Sep), 2133-2136.