World4AI

Generalized Policy Iteration

Policy iteration and value iteration are both versions of generalized policy iteration (GPI) algorithms. Switching between policy evaluation and policy improvement is the core of GPI regardless of the implementation details of the algorithm.

Many of the reinforcement learning algorithms yet to come are based on generalized policy iteration. Evaluation and improvement alternate to eventually find optimal value and policy functions.