Curriculum
Course: Reinforcement Learning Foundation
Login

Curriculum

Reinforcement Learning Foundation

PDF lesson

Value Iteration Walkthrough

Now that you understand the Bellman equation, this lecture shows you how to use it to actually solve an MDP. Two algorithms — Value Iteration and Policy Iteration — both converge to the same optimal policy π*, but they get there along different routes. We walk through each one by hand on a 3-state corridor and a 3×3 grid before comparing them.