On integral generalized policy iteration for continuous-time linear quadratic regulations

Jae Young Lee, Jin Bae Park, Yoon Ho Choi

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

This paper mathematically analyzes the integral generalized policy iteration (I-GPI) algorithms applied to a class of continuous-time linear quadratic regulation (LQR) problems with the unknown system matrix A. GPI is the general idea of interacting policy evaluation and policy improvement steps of policy iteration (PI), for computing the optimal policy. We first introduce the update horizon H, and then show that (i) all of the I-GPI methods with the same H can be considered equivalent and that (ii) the value function approximated in the policy evaluation step monotonically converges to the exact one as H→∞. This reveals the relation between the computational complexity and the update (or time) horizon of I-GPI as well as between I-PI and I-GPI in the limit H→∞. We also provide and discuss two modes of convergence of I-GPI; I-GPI behaves like PI in one mode, and in the other mode, it performs like value iteration for discrete-time LQR and infinitesimal GPI (H→0). From these results, a new classification of the integral reinforcement learning is formed with respect to H. Two matrix inequality conditions for stability, the region of local monotone convergence, and data-driven (adaptive) implementation methods are also provided with detailed discussion. Numerical simulations are carried out for verification and further investigations.

Original languageEnglish
Pages (from-to)475-489
Number of pages15
JournalAutomatica
Volume50
Issue number2
DOIs
Publication statusPublished - 2014 Feb 1

Fingerprint

Reinforcement learning
Computational complexity
Computer simulation

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Cite this

Lee, Jae Young ; Park, Jin Bae ; Choi, Yoon Ho. / On integral generalized policy iteration for continuous-time linear quadratic regulations. In: Automatica. 2014 ; Vol. 50, No. 2. pp. 475-489.
@article{4a048c2913f448aba7076ab09850d53d,
title = "On integral generalized policy iteration for continuous-time linear quadratic regulations",
abstract = "This paper mathematically analyzes the integral generalized policy iteration (I-GPI) algorithms applied to a class of continuous-time linear quadratic regulation (LQR) problems with the unknown system matrix A. GPI is the general idea of interacting policy evaluation and policy improvement steps of policy iteration (PI), for computing the optimal policy. We first introduce the update horizon H, and then show that (i) all of the I-GPI methods with the same H can be considered equivalent and that (ii) the value function approximated in the policy evaluation step monotonically converges to the exact one as H→∞. This reveals the relation between the computational complexity and the update (or time) horizon of I-GPI as well as between I-PI and I-GPI in the limit H→∞. We also provide and discuss two modes of convergence of I-GPI; I-GPI behaves like PI in one mode, and in the other mode, it performs like value iteration for discrete-time LQR and infinitesimal GPI (H→0). From these results, a new classification of the integral reinforcement learning is formed with respect to H. Two matrix inequality conditions for stability, the region of local monotone convergence, and data-driven (adaptive) implementation methods are also provided with detailed discussion. Numerical simulations are carried out for verification and further investigations.",
author = "Lee, {Jae Young} and Park, {Jin Bae} and Choi, {Yoon Ho}",
year = "2014",
month = "2",
day = "1",
doi = "10.1016/j.automatica.2013.12.009",
language = "English",
volume = "50",
pages = "475--489",
journal = "Automatica",
issn = "0005-1098",
publisher = "Elsevier Limited",
number = "2",

}

On integral generalized policy iteration for continuous-time linear quadratic regulations. / Lee, Jae Young; Park, Jin Bae; Choi, Yoon Ho.

In: Automatica, Vol. 50, No. 2, 01.02.2014, p. 475-489.

Research output: Contribution to journalArticle

TY - JOUR

T1 - On integral generalized policy iteration for continuous-time linear quadratic regulations

AU - Lee, Jae Young

AU - Park, Jin Bae

AU - Choi, Yoon Ho

PY - 2014/2/1

Y1 - 2014/2/1

N2 - This paper mathematically analyzes the integral generalized policy iteration (I-GPI) algorithms applied to a class of continuous-time linear quadratic regulation (LQR) problems with the unknown system matrix A. GPI is the general idea of interacting policy evaluation and policy improvement steps of policy iteration (PI), for computing the optimal policy. We first introduce the update horizon H, and then show that (i) all of the I-GPI methods with the same H can be considered equivalent and that (ii) the value function approximated in the policy evaluation step monotonically converges to the exact one as H→∞. This reveals the relation between the computational complexity and the update (or time) horizon of I-GPI as well as between I-PI and I-GPI in the limit H→∞. We also provide and discuss two modes of convergence of I-GPI; I-GPI behaves like PI in one mode, and in the other mode, it performs like value iteration for discrete-time LQR and infinitesimal GPI (H→0). From these results, a new classification of the integral reinforcement learning is formed with respect to H. Two matrix inequality conditions for stability, the region of local monotone convergence, and data-driven (adaptive) implementation methods are also provided with detailed discussion. Numerical simulations are carried out for verification and further investigations.

AB - This paper mathematically analyzes the integral generalized policy iteration (I-GPI) algorithms applied to a class of continuous-time linear quadratic regulation (LQR) problems with the unknown system matrix A. GPI is the general idea of interacting policy evaluation and policy improvement steps of policy iteration (PI), for computing the optimal policy. We first introduce the update horizon H, and then show that (i) all of the I-GPI methods with the same H can be considered equivalent and that (ii) the value function approximated in the policy evaluation step monotonically converges to the exact one as H→∞. This reveals the relation between the computational complexity and the update (or time) horizon of I-GPI as well as between I-PI and I-GPI in the limit H→∞. We also provide and discuss two modes of convergence of I-GPI; I-GPI behaves like PI in one mode, and in the other mode, it performs like value iteration for discrete-time LQR and infinitesimal GPI (H→0). From these results, a new classification of the integral reinforcement learning is formed with respect to H. Two matrix inequality conditions for stability, the region of local monotone convergence, and data-driven (adaptive) implementation methods are also provided with detailed discussion. Numerical simulations are carried out for verification and further investigations.

UR - http://www.scopus.com/inward/record.url?scp=84893829511&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893829511&partnerID=8YFLogxK

U2 - 10.1016/j.automatica.2013.12.009

DO - 10.1016/j.automatica.2013.12.009

M3 - Article

AN - SCOPUS:84893829511

VL - 50

SP - 475

EP - 489

JO - Automatica

JF - Automatica

SN - 0005-1098

IS - 2

ER -