Two broad categories: undirected and directed (Thrun, 1992). Undirected algorithms rely solely on randomness to provide exploration while directed ones incorporate knowledge about the learning process.
- Undirected exploration
- Directed exploration
- Error-based exploration: provoke states that has previously shown a high prediction error in order to maximize the knowledge gain (Schmidhuber, 1991)
- Recency-based exploration: provoke the state that occurs less recently (assumes that the control knowledge of a state gets worse by the time it's not updated) (Sutton, 1990)
- Thrun, S. B. (1992). Efficient exploration in reinforcement learning.
- D. Nguyen and B. Widrow. Truck backer-upper: An example of self-learning in neural networks. 1989. IEEE.
- Barto, A. G., Bradtke, S.J., & Singh, S. P. (1991). Real-time learning and control using asynchronous dynamic programming, (COINS Technical Report 91-57)
- Whitehead, S. D., & Ballard, D. H. (1991). Learning to Perceive and Act by Trial and Error. Machine Learning, 7(1), 45–83. doi:10.1023/A:1022619109594
- Schmidhuber, J. (1991). Adaptive Confidence And Adaptive Curiosity.
- Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning, 216–224.