Author: Mauro Comi

Chapter 6 on the Sutton and Barto.

🟧 Recap

We have seen a few methods to optimise the value function.

Policy Iteration

Untitled

In short:

(Good video: https://www.youtube.com/watch?v=RWQ2yfP8e0o&ab_channel=SanjoyDas)

Model-free Policy Evaluation

Untitled

Model-Free Policy Iteration

Greedy policy is simply estimated as: