This paper mathematically analyzes the integral generalized policy iteration (I-GPI) algorithms applied to a class of continuous-time linear quadratic regulation (LQR) problems with the unknown system matrix A. GPI is the general idea of interacting policy evaluation and policy improvement steps of policy iteration (PI), for computing the optimal policy. We first introduce the update horizon H, and then show that (i) all of the I-GPI methods with the same H can be considered equivalent and that (ii) the value function approximated in the policy evaluation step monotonically converges to the exact one as H→∞. This reveals the relation between the computational complexity and the update (or time) horizon of I-GPI as well as between I-PI and I-GPI in the limit H→∞. We also provide and discuss two modes of convergence of I-GPI; I-GPI behaves like PI in one mode, and in the other mode, it performs like value iteration for discrete-time LQR and infinitesimal GPI (H→0). From these results, a new classification of the integral reinforcement learning is formed with respect to H. Two matrix inequality conditions for stability, the region of local monotone convergence, and data-driven (adaptive) implementation methods are also provided with detailed discussion. Numerical simulations are carried out for verification and further investigations.
Bibliographical noteFunding Information:
This work has been supported by Institute of BioMed-IT , Energy-IT and Smart-IT Technology (BEST) , a Brain Korea 21 plus program, Yonsei University. The material in this paper was partially presented at the 50th IEEE Conference on Decision and Control (CDC) and European Control Conference, December 12–15, 2011, Orlando, Florida, USA and the 2013 American Control Conference, June 17–19, 2013, Washington, DC, USA. This paper was recommended for publication in revised form by Associate Editor Shuzhi Sam Ge under the direction of Editor Miroslav Krstic.
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Electrical and Electronic Engineering