In this paper, we propose a temporal difference (TD) learning method, called integral TD learning that efficiently finds solutions to continuous-time (CT) linear quadratic regulation (LQR) problems in an online fashion where system matrix A is unknown. The idea originates from a computational reinforcement learning method known as TD(0), which is the simplest TD method in a finite Markov decision process. For the proposed integral TD method, we mathematically analyze the positive definiteness of the updated value functions, monotone convergence conditions, and stability properties concerning the locations of the closed-loop poles in terms of the learning rate and the discount factor. The proposed method includes the existing value iteration method for CT LQR problems as a special case. Finally, numerical simulations are carried out to verify the effectiveness of the proposed method and further investigate the aforementioned mathematical properties.
|Number of pages||13|
|Journal||International Journal of Control, Automation and Systems|
|Publication status||Published - 2017 Feb 1|
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Computer Science Applications