Author: Mauro Comi
Chapter 6 on the Sutton and Barto.
Model-free prediction: estimate the value function of an unknown MDP.
Example, sample of tabular update with TD-learning:
$v_{t+1}(S_t) = v_t(S_t) + \alpha(R_{t+1} + \gamma v_t(S_{t+1}) - v_t(S_t))$
Model-free control: optimise the value function of an unknown MDP.
We have seen a few methods to optimise the value function.

In short:
(Good video: https://www.youtube.com/watch?v=RWQ2yfP8e0o&ab_channel=SanjoyDas)
Model-free Policy Evaluation

Model-Free Policy Iteration
Greedy policy is simply estimated as: