Integral temporal difference learning for continuous-time linear quadratic regulations

Tae Yoon Chun, Jae Young Lee, Jin Bae Park, Yoon Ho Choi

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

In this paper, we propose a temporal difference (TD) learning method, called integral TD learning that efficiently finds solutions to continuous-time (CT) linear quadratic regulation (LQR) problems in an online fashion where system matrix A is unknown. The idea originates from a computational reinforcement learning method known as TD(0), which is the simplest TD method in a finite Markov decision process. For the proposed integral TD method, we mathematically analyze the positive definiteness of the updated value functions, monotone convergence conditions, and stability properties concerning the locations of the closed-loop poles in terms of the learning rate and the discount factor. The proposed method includes the existing value iteration method for CT LQR problems as a special case. Finally, numerical simulations are carried out to verify the effectiveness of the proposed method and further investigate the aforementioned mathematical properties.

Original languageEnglish
Pages (from-to)226-238
Number of pages13
JournalInternational Journal of Control, Automation and Systems
Volume15
Issue number1
DOIs
Publication statusPublished - 2017 Feb 1

Fingerprint

Reinforcement learning
Poles
Computer simulation

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Computer Science Applications

Cite this

@article{807c586f48fe4222b0455d9206099d4b,
title = "Integral temporal difference learning for continuous-time linear quadratic regulations",
abstract = "In this paper, we propose a temporal difference (TD) learning method, called integral TD learning that efficiently finds solutions to continuous-time (CT) linear quadratic regulation (LQR) problems in an online fashion where system matrix A is unknown. The idea originates from a computational reinforcement learning method known as TD(0), which is the simplest TD method in a finite Markov decision process. For the proposed integral TD method, we mathematically analyze the positive definiteness of the updated value functions, monotone convergence conditions, and stability properties concerning the locations of the closed-loop poles in terms of the learning rate and the discount factor. The proposed method includes the existing value iteration method for CT LQR problems as a special case. Finally, numerical simulations are carried out to verify the effectiveness of the proposed method and further investigate the aforementioned mathematical properties.",
author = "Chun, {Tae Yoon} and Lee, {Jae Young} and Park, {Jin Bae} and Choi, {Yoon Ho}",
year = "2017",
month = "2",
day = "1",
doi = "10.1007/s12555-015-0319-1",
language = "English",
volume = "15",
pages = "226--238",
journal = "International Journal of Control, Automation and Systems",
issn = "1598-6446",
publisher = "Institute of Control, Robotics and Systems",
number = "1",

}

Integral temporal difference learning for continuous-time linear quadratic regulations. / Chun, Tae Yoon; Lee, Jae Young; Park, Jin Bae; Choi, Yoon Ho.

In: International Journal of Control, Automation and Systems, Vol. 15, No. 1, 01.02.2017, p. 226-238.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Integral temporal difference learning for continuous-time linear quadratic regulations

AU - Chun, Tae Yoon

AU - Lee, Jae Young

AU - Park, Jin Bae

AU - Choi, Yoon Ho

PY - 2017/2/1

Y1 - 2017/2/1

N2 - In this paper, we propose a temporal difference (TD) learning method, called integral TD learning that efficiently finds solutions to continuous-time (CT) linear quadratic regulation (LQR) problems in an online fashion where system matrix A is unknown. The idea originates from a computational reinforcement learning method known as TD(0), which is the simplest TD method in a finite Markov decision process. For the proposed integral TD method, we mathematically analyze the positive definiteness of the updated value functions, monotone convergence conditions, and stability properties concerning the locations of the closed-loop poles in terms of the learning rate and the discount factor. The proposed method includes the existing value iteration method for CT LQR problems as a special case. Finally, numerical simulations are carried out to verify the effectiveness of the proposed method and further investigate the aforementioned mathematical properties.

AB - In this paper, we propose a temporal difference (TD) learning method, called integral TD learning that efficiently finds solutions to continuous-time (CT) linear quadratic regulation (LQR) problems in an online fashion where system matrix A is unknown. The idea originates from a computational reinforcement learning method known as TD(0), which is the simplest TD method in a finite Markov decision process. For the proposed integral TD method, we mathematically analyze the positive definiteness of the updated value functions, monotone convergence conditions, and stability properties concerning the locations of the closed-loop poles in terms of the learning rate and the discount factor. The proposed method includes the existing value iteration method for CT LQR problems as a special case. Finally, numerical simulations are carried out to verify the effectiveness of the proposed method and further investigate the aforementioned mathematical properties.

UR - http://www.scopus.com/inward/record.url?scp=85009868991&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009868991&partnerID=8YFLogxK

U2 - 10.1007/s12555-015-0319-1

DO - 10.1007/s12555-015-0319-1

M3 - Article

AN - SCOPUS:85009868991

VL - 15

SP - 226

EP - 238

JO - International Journal of Control, Automation and Systems

JF - International Journal of Control, Automation and Systems

SN - 1598-6446

IS - 1

ER -