Q-Learning Algorithms: A Comprehensive Classification and Applications

Beakcheol Jang, Myeonghwi Kim, Gaspard Harerimana, Jong Wook Kim

Research output: Contribution to journalArticlepeer-review

124 Citations (Scopus)

Abstract

Q-learning is arguably one of the most applied representative reinforcement learning approaches and one of the off-policy strategies. Since the emergence of Q-learning, many studies have described its uses in reinforcement learning and artificial intelligence problems. However, there is an information gap as to how these powerful algorithms can be leveraged and incorporated into general artificial intelligence workflow. Early Q-learning algorithms were unsatisfactory in several aspects and covered a narrow range of applications. It has also been observed that sometimes, this rather powerful algorithm learns unrealistically and overestimates the action values hence abating the overall performance. Recently with the general advances of machine learning, more variants of Q-learning like Deep Q-learning which combines basic Q learning with deep neural networks have been discovered and applied extensively. In this paper, we thoroughly explain how Q-learning evolved by unraveling the mathematical complexities behind it as well its flow from reinforcement learning family of algorithms. Improved variants are fully described, and we categorize Q-learning algorithms into single-agent and multi-agent approaches. Finally, we thoroughly investigate up-to-date research trends and key applications that leverage Q-learning algorithms.

Original languageEnglish
Article number8836506
Pages (from-to)133653-133667
Number of pages15
JournalIEEE Access
Volume7
DOIs
Publication statusPublished - 2019

Bibliographical note

Publisher Copyright:
© 2013 IEEE.

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)

Fingerprint

Dive into the research topics of 'Q-Learning Algorithms: A Comprehensive Classification and Applications'. Together they form a unique fingerprint.

Cite this