## Contents

## Terminology[]

- Deterministic vs. randomized policy

## Training[]

- Optimal Learning Trajectories: Some algorithms assume that the Optimal Learning Trajectories (OLTs) are known for all learning examples. An OLT is a sequence of actions that, given an input, leads from the initial state to the correct output.
- Optimal Learning Policy: Some algorithms assume that for each learning example, we know an Optimal Learning Policy (OLP). The OLP is a procedure that knows the best decision to perform for any state of the prediction space.

## Arguments for reinforcement learning[]

There's **not enough training data** for supervised learning to succeed in many tasks. See also Yoshua Bengio's argument for unsupervised learning^{[note 1]}.

**Anthropomorphic argument** (albeit a weak one): children learn from a small amount of "labeled" data. Humans of all age learn by trial-and-error, environment simulation,...

## Frameworks[]

- TensorForce: deep integration with TensorFlow, good amount of documentation
- Github repo: https://github.com/reinforceio/tensorforce
- Blog: https://reinforce.io/blog/

- Keras-rl: simplistic, comes with some documentation
- Google's Dopamin: research-oriented framework
- Facebook's Horizon: production-ready framework based on PyTorch
- Gorilla
- RL-Glue: old framework, unlikely to scale
- Reference: Tanner and White (2009)
^{[1]}, - Website: https://sites.google.com/a/rl-community.org/rl-glue/Home?authuser=0

- Reference: Tanner and White (2009)
- A list of many frameworks is here

## See also[]

- Solving Relational and First-order Markov decision processes
- Policy gradient methods
- An introduction to Policy Gradient Methods
- Jürgen Schmidhuber's futuristic RL
- Satinder Singh's homepage
- RL blog (with links to labs and other resources)
- Trial and error in human -- while a typical RL algorithm would need millions attempts, this girl needs 16. She did so not by policy gradient or value iteration but by consciously thinking about problems, different solutions, maybe also by simulation in her mind.

## References[]

## Notes[]

- ↑ Note that he means non-standard unsupervised learning in which an agent can also interact with its environment.

- ↑ Tanner, B., & White, A. (2009). RL-Glue: Language-independent software for reinforcement-learning experiments.
*Journal of Machine Learning Research*,*10*(Sep), 2133-2136.