Adaptive dynamic programming for discrete-time linear quadratic regulation based on multirate generalised policy iteration

Tae Yoon Chun, Jae Young Lee, Jin Bae Park, Yoon Ho Choi

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In this paper, we propose two multirate generalised policy iteration (GPI) algorithms applied to discrete-time linear quadratic regulation problems. The proposed algorithms are extensions of the existing GPI algorithm that consists of the approximate policy evaluation and policy improvement steps. The two proposed schemes, named heuristic dynamic programming (HDP) and dual HDP (DHP), based on multirate GPI, use multi-step estimation (M-step Bellman equation) at the approximate policy evaluation step for estimating the value function and its gradient called costate, respectively. Then, we show that these two methods with the same update horizon can be considered equivalent in the iteration domain. Furthermore, monotonically increasing and decreasing convergences, so called value iteration (VI)-mode and policy iteration (PI)-mode convergences, are proved to hold for the proposed multirate GPIs. Further, general convergence properties in terms of eigenvalues are also studied. The data-driven online implementation methods for the proposed HDP and DHP are demonstrated and finally, we present the results of numerical simulations performed to verify the effectiveness of the proposed methods.

Original languageEnglish
Pages (from-to)1-18
Number of pages18
JournalInternational Journal of Control
DOIs
Publication statusAccepted/In press - 2017 Apr 1

Fingerprint

Dynamic programming
Computer simulation

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Computer Science Applications

Cite this

@article{55b9d49e4bc8431aa61ca047b54b225e,
title = "Adaptive dynamic programming for discrete-time linear quadratic regulation based on multirate generalised policy iteration",
abstract = "In this paper, we propose two multirate generalised policy iteration (GPI) algorithms applied to discrete-time linear quadratic regulation problems. The proposed algorithms are extensions of the existing GPI algorithm that consists of the approximate policy evaluation and policy improvement steps. The two proposed schemes, named heuristic dynamic programming (HDP) and dual HDP (DHP), based on multirate GPI, use multi-step estimation (M-step Bellman equation) at the approximate policy evaluation step for estimating the value function and its gradient called costate, respectively. Then, we show that these two methods with the same update horizon can be considered equivalent in the iteration domain. Furthermore, monotonically increasing and decreasing convergences, so called value iteration (VI)-mode and policy iteration (PI)-mode convergences, are proved to hold for the proposed multirate GPIs. Further, general convergence properties in terms of eigenvalues are also studied. The data-driven online implementation methods for the proposed HDP and DHP are demonstrated and finally, we present the results of numerical simulations performed to verify the effectiveness of the proposed methods.",
author = "Chun, {Tae Yoon} and Lee, {Jae Young} and Park, {Jin Bae} and Choi, {Yoon Ho}",
year = "2017",
month = "4",
day = "1",
doi = "10.1080/00207179.2017.1312669",
language = "English",
pages = "1--18",
journal = "International Journal of Control",
issn = "0020-7179",
publisher = "Taylor and Francis Ltd.",

}

Adaptive dynamic programming for discrete-time linear quadratic regulation based on multirate generalised policy iteration. / Chun, Tae Yoon; Lee, Jae Young; Park, Jin Bae; Choi, Yoon Ho.

In: International Journal of Control, 01.04.2017, p. 1-18.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Adaptive dynamic programming for discrete-time linear quadratic regulation based on multirate generalised policy iteration

AU - Chun, Tae Yoon

AU - Lee, Jae Young

AU - Park, Jin Bae

AU - Choi, Yoon Ho

PY - 2017/4/1

Y1 - 2017/4/1

N2 - In this paper, we propose two multirate generalised policy iteration (GPI) algorithms applied to discrete-time linear quadratic regulation problems. The proposed algorithms are extensions of the existing GPI algorithm that consists of the approximate policy evaluation and policy improvement steps. The two proposed schemes, named heuristic dynamic programming (HDP) and dual HDP (DHP), based on multirate GPI, use multi-step estimation (M-step Bellman equation) at the approximate policy evaluation step for estimating the value function and its gradient called costate, respectively. Then, we show that these two methods with the same update horizon can be considered equivalent in the iteration domain. Furthermore, monotonically increasing and decreasing convergences, so called value iteration (VI)-mode and policy iteration (PI)-mode convergences, are proved to hold for the proposed multirate GPIs. Further, general convergence properties in terms of eigenvalues are also studied. The data-driven online implementation methods for the proposed HDP and DHP are demonstrated and finally, we present the results of numerical simulations performed to verify the effectiveness of the proposed methods.

AB - In this paper, we propose two multirate generalised policy iteration (GPI) algorithms applied to discrete-time linear quadratic regulation problems. The proposed algorithms are extensions of the existing GPI algorithm that consists of the approximate policy evaluation and policy improvement steps. The two proposed schemes, named heuristic dynamic programming (HDP) and dual HDP (DHP), based on multirate GPI, use multi-step estimation (M-step Bellman equation) at the approximate policy evaluation step for estimating the value function and its gradient called costate, respectively. Then, we show that these two methods with the same update horizon can be considered equivalent in the iteration domain. Furthermore, monotonically increasing and decreasing convergences, so called value iteration (VI)-mode and policy iteration (PI)-mode convergences, are proved to hold for the proposed multirate GPIs. Further, general convergence properties in terms of eigenvalues are also studied. The data-driven online implementation methods for the proposed HDP and DHP are demonstrated and finally, we present the results of numerical simulations performed to verify the effectiveness of the proposed methods.

UR - http://www.scopus.com/inward/record.url?scp=85018497818&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018497818&partnerID=8YFLogxK

U2 - 10.1080/00207179.2017.1312669

DO - 10.1080/00207179.2017.1312669

M3 - Article

SP - 1

EP - 18

JO - International Journal of Control

JF - International Journal of Control

SN - 0020-7179

ER -