Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression

Jaehyun Lim, Seungchul Ha, Jongeun Choi

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Inverse reinforcement learning (IRL) is a technique for automatic reward acquisition, however, it is difficult to apply to high-dimensional problems with unknown dynamics. This article proposes an efficient way to solve the IRL problem based on the sparse Gaussian process (GP) prediction with l-1-regularization only using a highly limited number of expert demonstrations. A GP model is proposed to be trained to predict a reward function using trajectory-reward pair data generated by deep reinforcement learning with different reward functions. The trained GP successfully predicts the reward functions of human experts from their collected demonstration trajectory datasets. To demonstrate our approach, the proposed approach is applied to the obstacle avoidance navigation of the mobile robot. The experimental results clearly show that the robots can clone the experts' optimality in navigation trajectories avoiding obstacles using only with a very small number of expert demonstration datasets (e.g., \leq 6). Therefore, the proposed approach shows great potential to be applied to complex real-world applications in an expert data-efficient manner.

Original languageEnglish
Article number9091100
Pages (from-to)1739-1746
Number of pages8
JournalIEEE/ASME Transactions on Mechatronics
Volume25
Issue number4
DOIs
Publication statusPublished - 2020 Aug

Bibliographical note

Funding Information:
Manuscript received January 9, 2020; revised March 27, 2020; accepted April 24, 2020. Date of publication May 11, 2020; date of current version August 13, 2020. This work was supported in part by the Technology Innovation Program through the Korea Evaluation Institute of Industrial Technology funded by the Ministry of Trade, Industry and Energy (10073129), and in part by the Mid-Career Research Program through the National Research Foundation of Korea funded by the Ministry of Science and ICT (NRF-2018R1A2B6008063). Recommended by Technical Editor H. R. Karimi and Senior Editor X. Chen. (Corresponding author: Jongeun Choi.) The authors are with the School of Mechanical Engineering, Yonsei University, Seoul 03722, South Korea (e-mail: jaehyunlim@yonsei.ac.kr; seungchul0406@gmail.com; jongeunchoi@yonsei.ac.kr).

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression'. Together they form a unique fingerprint.

Cite this