Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations

Jae Young Lee, Jin Bae Park, Yoon Ho Choi

Research output: Contribution to journalArticle

36 Citations (Scopus)

Abstract

This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral $Q$ -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.

Original languageEnglish
Article number6882245
Pages (from-to)916-932
Number of pages17
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume26
Issue number5
DOIs
Publication statusPublished - 2015 May 1

Fingerprint

Reinforcement learning
Nonlinear systems
Closed loop systems
Learning algorithms
Dynamical systems
Neural networks
Computer simulation

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

@article{5c435e4fb44444fb90d7dc807f08b1fe,
title = "Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations",
abstract = "This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral $Q$ -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.",
author = "Lee, {Jae Young} and Park, {Jin Bae} and Choi, {Yoon Ho}",
year = "2015",
month = "5",
day = "1",
doi = "10.1109/TNNLS.2014.2328590",
language = "English",
volume = "26",
pages = "916--932",
journal = "IEEE Transactions on Neural Networks and Learning Systems",
issn = "2162-237X",
publisher = "IEEE Computational Intelligence Society",
number = "5",

}

Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations. / Lee, Jae Young; Park, Jin Bae; Choi, Yoon Ho.

In: IEEE Transactions on Neural Networks and Learning Systems, Vol. 26, No. 5, 6882245, 01.05.2015, p. 916-932.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations

AU - Lee, Jae Young

AU - Park, Jin Bae

AU - Choi, Yoon Ho

PY - 2015/5/1

Y1 - 2015/5/1

N2 - This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral $Q$ -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.

AB - This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral $Q$ -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.

UR - http://www.scopus.com/inward/record.url?scp=85027928575&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027928575&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2014.2328590

DO - 10.1109/TNNLS.2014.2328590

M3 - Article

AN - SCOPUS:85027928575

VL - 26

SP - 916

EP - 932

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

SN - 2162-237X

IS - 5

M1 - 6882245

ER -