Now that you understand the Bellman equation, this lecture shows you how to use it to actually solve an MDP. Two algorithms — Value Iteration and Policy Iteration — both converge to the same optimal policy π*, but they get there along different routes. We walk through each one by hand on a 3-state corridor and a 3×3 grid before comparing them.