Training a robot that engages with people is challenging, because it is expensive to involve people in a robot training process requiring numerous data samples. This paper proposes a human path prediction network (HPPN) and an evolution strategy-based robot training method using virtual human movements generated by the HPPN, which compensates for this sample inefficiency problem. We applied the proposed method to the training of a robotic guide for visually impaired people, which was designed to collect multimodal human response data and reflect such data when selecting the robot’s actions. We collected 1,507 real-world episodes for training the HPPN and then generated over 100,000 virtual episodes for training the robot policy. User test results indicate that our trained robot accurately guides blindfolded participants along a goal path. In addition, by the designed reward to pursue both guidance accuracy and human comfort during the robot policy training process, our robot leads to improved smoothness in human motion while maintaining the accuracy of the guidance. This sample-efficient training method is expected to be widely applicable to all robots and computing machinery that physically interact with humans.
|Publication status||Published - 2020 Aug 11|
All Science Journal Classification (ASJC) codes