Changes: Reinforcement learning

Revision as of 04:37, 4 November 2018

Terminology

Deterministic vs. randomized policy

Training

Optimal Learning Trajectories: Some algorithms assume that the Optimal Learning Trajectories (OLTs) are known for all learning examples. An OLT is a sequence of actions that, given an input, leads from the initial state to the correct output.
Optimal Learning Policy: Some algorithms assume that for each learning example, we know an Optimal Learning Policy (OLP). The OLP is a procedure that knows the best decision to perform for any state of the prediction space.

Arguments for reinforcement learning

There's not enough training data for supervised learning to succeed in many tasks. See also Yoshua Bengio's argument for unsupervised learning^{[note 1]}.

Anthropomorphic argument (albeit a weak one): children learn from a small amount of "labeled" data. Humans of all age learn by trial-and-error, environment simulation,...

Frameworks

Google's Dopamin
Facebook's Horizon: production-ready framework based on PyTorch
Gorilla
RL-Glue: old framework, unlikely to scale
- Reference: Tanner and White (2009)^[1],
- Website: https://sites.google.com/a/rl-community.org/rl-glue/Home?authuser=0

References

Notes

↑ Note that he means non-standard unsupervised learning in which an agent can also interact with its environment.

↑ Tanner, B., & White, A. (2009). RL-Glue: Language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research, 10(Sep), 2133-2136.

[1] Note that he means non-standard unsupervised learning in which an agent can also interact with its environment.

[2] Tanner, B., & White, A. (2009). RL-Glue: Language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research, 10(Sep), 2133-2136.

[note 1]

[1]

@@ Line 14: / Line 14: @@
 '''Anthropomorphic argument''' (albeit a weak one): children learn from a small amount of "labeled" data. Humans of all age learn by trial-and-error, environment simulation,...
-== Implementation ==
+== Frameworks ==
+* Google's Dopamin
+* Facebook's Horizon: production-ready framework based on PyTorch
 * [http://arxiv.org/pdf/1507.04296.pdf Gorilla]
+* RL-Glue: old framework, unlikely to scale
+** Reference: Tanner and White (2009)<ref>Tanner, B., & White, A. (2009). RL-Glue: Language-independent software for reinforcement-learning experiments. ''Journal of Machine Learning Research'', ''10''(Sep), 2133-2136.</ref>,
+** Website: https://sites.google.com/a/rl-community.org/rl-glue/Home?authuser=0
 == See also ==
@@ Line 29: / Line 34: @@
 == Notes ==
-<references group=note/>
+<references group="note" />
 [[Category:Reinforcement learning]]