Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning

Han Cha, Jihong Park, Hyesung Kim, Mehdi Bennis, Seong Lyun Kim

Research output: Contribution to journalArticlepeer-review

11 Citations (Scopus)


Traditional distributed deep reinforcement learning (RL) commonly relies on exchanging the experience replay memory (RM) of each agent. Since the RM contains all state observations and action policy history, it may incur huge communication overhead while violating the privacy of each agent. Alternatively, this article presents a communication-efficient and privacy-preserving distributed RL framework, coined federated reinforcement distillation (FRD). In FRD, each agent exchanges its proxy experience RM (ProxRM), in which policies are locally averaged with respect to proxy states clustering actual states. To provide FRD design insights, we present ablation studies on the impact of ProxRM structures, neural network architectures, and communication intervals. Furthermore, we propose an improved version of FRD, coined mixup augmented FRD (MixFRD), in which ProxRM is interpolated using the mixup data augmentation algorithm. Simulations in a Cartpole environment validate the effectiveness of MixFRD in reducing the variance of mission completion time and communication cost, compared to the benchmark schemes, vanilla FRD, federated RL (FRL), and policy distillation.

Original languageEnglish
Article number9094324
Pages (from-to)94-101
Number of pages8
JournalIEEE Intelligent Systems
Issue number4
Publication statusPublished - 2020 Jul 1

Bibliographical note

Funding Information:
This article was supported in part by a grant to Bio-Mimetic Robot Research Center Funded by Defense Acquisition Program Administration, and by Agency for Defense Development (UD190018ID), and in part by the Academy of Finland under Grant 294128.

Publisher Copyright:
© 2001-2011 IEEE.

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Artificial Intelligence


Dive into the research topics of 'Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning'. Together they form a unique fingerprint.

Cite this