FANDOM


There are at least two formalisms: Tesauro and Galperin (1997)[1] and Bertsekas (2005)[2]

Rollout-bertsekas

References Edit

  1. Tesauro, G., & Galperin, G. R. (1997). On-line Policy Improvement using Monte-Carlo Search. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems 9 (pp. 1068–1074). MIT Press. Retrieved from http://papers.nips.cc/paper/1302-on-line-policy-improvement-using-monte-carlo-search.pdf
  2. Bertsekas, D. P. (2005). Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC*. European Journal of Control, 11(4-5), 310–334. doi:10.3166/ejc.11.310-334