Inverse reinforcement learning (IRL) is a technique for automatic reward acquisition, however, it is difficult to apply to high-dimensional problems with unknown dynamics. This article proposes an efficient way to solve the IRL problem based on the sparse Gaussian process (GP) prediction with l-1-regularization only using a highly limited number of expert demonstrations. A GP model is proposed to be trained to predict a reward function using trajectory-reward pair data generated by deep reinforcement learning with different reward functions. The trained GP successfully predicts the reward functions of human experts from their collected demonstration trajectory datasets. To demonstrate our approach, the proposed approach is applied to the obstacle avoidance navigation of the mobile robot. The experimental results clearly show that the robots can clone the experts' optimality in navigation trajectories avoiding obstacles using only with a very small number of expert demonstration datasets (e.g., \leq 6). Therefore, the proposed approach shows great potential to be applied to complex real-world applications in an expert data-efficient manner.
Bibliographical noteFunding Information:
Manuscript received January 9, 2020; revised March 27, 2020; accepted April 24, 2020. Date of publication May 11, 2020; date of current version August 13, 2020. This work was supported in part by the Technology Innovation Program through the Korea Evaluation Institute of Industrial Technology funded by the Ministry of Trade, Industry and Energy (10073129), and in part by the Mid-Career Research Program through the National Research Foundation of Korea funded by the Ministry of Science and ICT (NRF-2018R1A2B6008063). Recommended by Technical Editor H. R. Karimi and Senior Editor X. Chen. (Corresponding author: Jongeun Choi.) The authors are with the School of Mechanical Engineering, Yonsei University, Seoul 03722, South Korea (e-mail: firstname.lastname@example.org; email@example.com; firstname.lastname@example.org).
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Computer Science Applications
- Electrical and Electronic Engineering