Reinforcement Q-learning based on Multirate Generalized Policy Iteration and Its Application to a 2-DOF Helicopter

Tae Yoon Chun, Jin Bae Park, Yoon Ho Choi

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

In this paper, we propose a novel Q-learning method based on multirate generalized policy iteration (MGPI) for unknown discrete-time (DT) linear quadratic regulation (LQR) problems. Q-learning is an effective scheme for unknown dynamical systems because it does not require any knowledge of the system dynamics to solve optimal control problems. By applying the MGPI concept, which is an extension of basic GPI with multirate time horizon steps, a new Q-learning algorithm is proposed for solving the LQR problem. Further, it is proven that the proposed algorithm converges to an optimal solution i.e., it learns the optimal control policy iteratively using the states and the control-input information. Finally, we employ the two degree-of-freedom helicopter model to verify the effectiveness of the proposed method and investigate its convergence properties.

Original languageEnglish
Pages (from-to)377-386
Number of pages10
JournalInternational Journal of Control, Automation and Systems
Volume16
Issue number1
DOIs
Publication statusPublished - 2018 Feb 1

Fingerprint

Helicopters
Reinforcement
Dynamical systems
Learning algorithms

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Computer Science Applications

Cite this

@article{79e60d3f92284715ad7f41755f476e42,
title = "Reinforcement Q-learning based on Multirate Generalized Policy Iteration and Its Application to a 2-DOF Helicopter",
abstract = "In this paper, we propose a novel Q-learning method based on multirate generalized policy iteration (MGPI) for unknown discrete-time (DT) linear quadratic regulation (LQR) problems. Q-learning is an effective scheme for unknown dynamical systems because it does not require any knowledge of the system dynamics to solve optimal control problems. By applying the MGPI concept, which is an extension of basic GPI with multirate time horizon steps, a new Q-learning algorithm is proposed for solving the LQR problem. Further, it is proven that the proposed algorithm converges to an optimal solution i.e., it learns the optimal control policy iteratively using the states and the control-input information. Finally, we employ the two degree-of-freedom helicopter model to verify the effectiveness of the proposed method and investigate its convergence properties.",
author = "Chun, {Tae Yoon} and Park, {Jin Bae} and Choi, {Yoon Ho}",
year = "2018",
month = "2",
day = "1",
doi = "10.1007/s12555-017-0172-5",
language = "English",
volume = "16",
pages = "377--386",
journal = "International Journal of Control, Automation and Systems",
issn = "1598-6446",
publisher = "Institute of Control, Robotics and Systems",
number = "1",

}

Reinforcement Q-learning based on Multirate Generalized Policy Iteration and Its Application to a 2-DOF Helicopter. / Chun, Tae Yoon; Park, Jin Bae; Choi, Yoon Ho.

In: International Journal of Control, Automation and Systems, Vol. 16, No. 1, 01.02.2018, p. 377-386.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Reinforcement Q-learning based on Multirate Generalized Policy Iteration and Its Application to a 2-DOF Helicopter

AU - Chun, Tae Yoon

AU - Park, Jin Bae

AU - Choi, Yoon Ho

PY - 2018/2/1

Y1 - 2018/2/1

N2 - In this paper, we propose a novel Q-learning method based on multirate generalized policy iteration (MGPI) for unknown discrete-time (DT) linear quadratic regulation (LQR) problems. Q-learning is an effective scheme for unknown dynamical systems because it does not require any knowledge of the system dynamics to solve optimal control problems. By applying the MGPI concept, which is an extension of basic GPI with multirate time horizon steps, a new Q-learning algorithm is proposed for solving the LQR problem. Further, it is proven that the proposed algorithm converges to an optimal solution i.e., it learns the optimal control policy iteratively using the states and the control-input information. Finally, we employ the two degree-of-freedom helicopter model to verify the effectiveness of the proposed method and investigate its convergence properties.

AB - In this paper, we propose a novel Q-learning method based on multirate generalized policy iteration (MGPI) for unknown discrete-time (DT) linear quadratic regulation (LQR) problems. Q-learning is an effective scheme for unknown dynamical systems because it does not require any knowledge of the system dynamics to solve optimal control problems. By applying the MGPI concept, which is an extension of basic GPI with multirate time horizon steps, a new Q-learning algorithm is proposed for solving the LQR problem. Further, it is proven that the proposed algorithm converges to an optimal solution i.e., it learns the optimal control policy iteratively using the states and the control-input information. Finally, we employ the two degree-of-freedom helicopter model to verify the effectiveness of the proposed method and investigate its convergence properties.

UR - http://www.scopus.com/inward/record.url?scp=85040663912&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85040663912&partnerID=8YFLogxK

U2 - 10.1007/s12555-017-0172-5

DO - 10.1007/s12555-017-0172-5

M3 - Article

AN - SCOPUS:85040663912

VL - 16

SP - 377

EP - 386

JO - International Journal of Control, Automation and Systems

JF - International Journal of Control, Automation and Systems

SN - 1598-6446

IS - 1

ER -