High Path Converging Reward Space: Imitating Human Behavior in Robot Arm Control

Da Hyun Koh, Seong Won Jang, Hyunseok Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we suggest High Path Converging Reward Space (HPCR) as a reinforcement learning reward shaping method. HPCR helps robot arms learn quickly and stably in certain environments that require specific actions and are difficult to receive rewards. In environments where a person reaches a goal at a distance, he or she tends to reach the goal by first determining the direction to throw and then increasing the speed or strength. To imitate this way of human behavior, the initial reward space is defined as the space between the robot arm and the goal point. It allows the robot arm to learn along the path in that direction. Then, when the robot arm proceeds with a certain number of successes or episodes, the reward space range gradually converges toward the goal point, eventually becoming the size of the actual goal size. In addition, HPCR can minimize the range in the z-axis direction of the reward space by considering only the value at the maximum height of the goal, even if the environment is complicated. Using HPCR with SAC+HER and TQC+HER algorithm, it is possible to stably reach a position that cannot be reached using the existing reward space.

Original languageEnglish
Title of host publication2022 22nd International Conference on Control, Automation and Systems, ICCAS 2022
PublisherIEEE Computer Society
Pages680-685
Number of pages6
ISBN (Electronic)9788993215243
DOIs
Publication statusPublished - 2022
Event22nd International Conference on Control, Automation and Systems, ICCAS 2022 - Busan, Korea, Republic of
Duration: 2022 Nov 272022 Dec 1

Publication series

NameInternational Conference on Control, Automation and Systems
Volume2022-November
ISSN (Print)1598-7833

Conference

Conference22nd International Conference on Control, Automation and Systems, ICCAS 2022
Country/TerritoryKorea, Republic of
CityBusan
Period22/11/2722/12/1

Bibliographical note

Publisher Copyright:
© 2022 ICROS.

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'High Path Converging Reward Space: Imitating Human Behavior in Robot Arm Control'. Together they form a unique fingerprint.

Cite this