There are at least two formalisms: Tesauro and Galperin (1997)[1] and Bertsekas (2005)[2]


References Edit

  1. Tesauro, G., & Galperin, G. R. (1997). On-line Policy Improvement using Monte-Carlo Search. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems 9 (pp. 1068–1074). MIT Press. Retrieved from
  2. Bertsekas, D. P. (2005). Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC*. European Journal of Control, 11(4-5), 310–334. doi:10.3166/ejc.11.310-334
Community content is available under CC-BY-SA unless otherwise noted.