There are at least two formalisms: Tesauro and Galperin (1997)[1] and Bertsekas (2005)[2]


