Value Iteration Walkthrough

Now that you understand the Bellman equation, this lecture shows you how to use it to actually solve an MDP. Two algorithms — Value Iteration and Policy Iteration — both converge to the same optimal policy π*, but they get there along different routes. We walk through each one by hand on a 3-state corridor and a 3×3 grid before comparing them.

We noticed you're visiting from United Arab Emirates. We've updated our prices to United Arab Emirates dirham for your shopping convenience. Use United States (US) dollar instead. Dismiss