Two broad categories: undirected and directed (Thrun, 1992)[1]. Undirected algorithms rely solely on randomness to provide exploration while directed ones incorporate knowledge about the learning process.

  • Undirected exploration
    • random walk (Nguyen and Widrow, 1989)[2]
    • exploration by modified probability distribution
      • Boltzmann distributions (Barto et al. 1991)[3]
      • semi-uniform distributions (Whitehead and Ballard, 1991)[4]
  • Directed exploration
    • Error-based exploration: provoke states that has previously shown a high prediction error in order to maximize the knowledge gain (Schmidhuber, 1991)[5]
    • Recency-based exploration: provoke the state that occurs less recently (assumes that the control knowledge of a state gets worse by the time it's not updated) (Sutton, 1990)[6]

References Edit

  1. Thrun, S. B. (1992). Efficient exploration in reinforcement learning.
  2. D. Nguyen and B. Widrow. Truck backer-upper: An example of self-learning in neural networks. 1989. IEEE.
  3. Barto, A. G., Bradtke, S.J., & Singh, S. P. (1991). Real-time learning and control using asynchronous dynamic programming, (COINS Technical Report 91-57)
  4. Whitehead, S. D., & Ballard, D. H. (1991). Learning to Perceive and Act by Trial and Error. Machine Learning, 7(1), 45–83. doi:10.1023/A:1022619109594
  5. Schmidhuber, J. (1991). Adaptive Confidence And Adaptive Curiosity.
  6. Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning, 216–224.