TY - JOUR
T1 - Reinforcement Q-learning based on Multirate Generalized Policy Iteration and Its Application to a 2-DOF Helicopter
AU - Chun, Tae Yoon
AU - Park, Jin Bae
AU - Choi, Yoon Ho
N1 - Publisher Copyright:
© 2018, Institute of Control, Robotics and Systems and The Korean Institute of Electrical Engineers and Springer-Verlag GmbH Germany, part of Springer Nature.
Copyright:
Copyright 2018 Elsevier B.V., All rights reserved.
PY - 2018/2/1
Y1 - 2018/2/1
N2 - In this paper, we propose a novel Q-learning method based on multirate generalized policy iteration (MGPI) for unknown discrete-time (DT) linear quadratic regulation (LQR) problems. Q-learning is an effective scheme for unknown dynamical systems because it does not require any knowledge of the system dynamics to solve optimal control problems. By applying the MGPI concept, which is an extension of basic GPI with multirate time horizon steps, a new Q-learning algorithm is proposed for solving the LQR problem. Further, it is proven that the proposed algorithm converges to an optimal solution i.e., it learns the optimal control policy iteratively using the states and the control-input information. Finally, we employ the two degree-of-freedom helicopter model to verify the effectiveness of the proposed method and investigate its convergence properties.
AB - In this paper, we propose a novel Q-learning method based on multirate generalized policy iteration (MGPI) for unknown discrete-time (DT) linear quadratic regulation (LQR) problems. Q-learning is an effective scheme for unknown dynamical systems because it does not require any knowledge of the system dynamics to solve optimal control problems. By applying the MGPI concept, which is an extension of basic GPI with multirate time horizon steps, a new Q-learning algorithm is proposed for solving the LQR problem. Further, it is proven that the proposed algorithm converges to an optimal solution i.e., it learns the optimal control policy iteratively using the states and the control-input information. Finally, we employ the two degree-of-freedom helicopter model to verify the effectiveness of the proposed method and investigate its convergence properties.
UR - http://www.scopus.com/inward/record.url?scp=85040663912&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040663912&partnerID=8YFLogxK
U2 - 10.1007/s12555-017-0172-5
DO - 10.1007/s12555-017-0172-5
M3 - Article
AN - SCOPUS:85040663912
VL - 16
SP - 377
EP - 386
JO - International Journal of Control, Automation and Systems
JF - International Journal of Control, Automation and Systems
SN - 1598-6446
IS - 1
ER -