Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

External Research Organisations

  • Queen Mary University of London
  • Nanjing University
  • Microsoft Corporation
View graph of relations

Details

Original languageEnglish
Title of host publicationProceedings of the 2024 IEEE Conference on Games, CoG 2024
PublisherIEEE Computer Society
ISBN (electronic)9798350350678
ISBN (print)979-8-3503-5068-5
Publication statusPublished - 5 Aug 2024
Event6th Annual IEEE Conference on Games, CoG 2024 - Milan, Italy
Duration: 5 Aug 20248 Aug 2024

Publication series

NameIEEE Conference on Computatonal Intelligence and Games, CIG
ISSN (Print)2325-4270
ISSN (electronic)2325-4289

Abstract

One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate 3 MARL methods on 6 SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://github.com/egg-west/rr-for-MARL.

Keywords

    Multi-Agent Reinforcement Learning, Reinforcement Learning, Sample efficiency, Starcraft II

ASJC Scopus subject areas

Cite this

Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. / Xu, Linjie; Liu, Zichuan; Dockhorn, Alexander et al.
Proceedings of the 2024 IEEE Conference on Games, CoG 2024. IEEE Computer Society, 2024. (IEEE Conference on Computatonal Intelligence and Games, CIG).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Xu, L, Liu, Z, Dockhorn, A, Perez-Liebana, D, Wang, J, Song, L & Bian, J 2024, Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. in Proceedings of the 2024 IEEE Conference on Games, CoG 2024. IEEE Conference on Computatonal Intelligence and Games, CIG, IEEE Computer Society, 6th Annual IEEE Conference on Games, CoG 2024, Milan, Italy, 5 Aug 2024. https://doi.org/10.48550/arXiv.2404.09715, https://doi.org/10.1109/CoG60054.2024.10645658
Xu, L., Liu, Z., Dockhorn, A., Perez-Liebana, D., Wang, J., Song, L., & Bian, J. (2024). Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. In Proceedings of the 2024 IEEE Conference on Games, CoG 2024 (IEEE Conference on Computatonal Intelligence and Games, CIG). IEEE Computer Society. https://doi.org/10.48550/arXiv.2404.09715, https://doi.org/10.1109/CoG60054.2024.10645658
Xu L, Liu Z, Dockhorn A, Perez-Liebana D, Wang J, Song L et al. Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. In Proceedings of the 2024 IEEE Conference on Games, CoG 2024. IEEE Computer Society. 2024. (IEEE Conference on Computatonal Intelligence and Games, CIG). doi: 10.48550/arXiv.2404.09715, 10.1109/CoG60054.2024.10645658
Xu, Linjie ; Liu, Zichuan ; Dockhorn, Alexander et al. / Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. Proceedings of the 2024 IEEE Conference on Games, CoG 2024. IEEE Computer Society, 2024. (IEEE Conference on Computatonal Intelligence and Games, CIG).
Download
@inproceedings{8dee49296a804a38ad3697efe4e17ea6,
title = "Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning",
abstract = "One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate 3 MARL methods on 6 SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://github.com/egg-west/rr-for-MARL.",
keywords = "Multi-Agent Reinforcement Learning, Reinforcement Learning, Sample efficiency, Starcraft II",
author = "Linjie Xu and Zichuan Liu and Alexander Dockhorn and Diego Perez-Liebana and Jinyu Wang and Lei Song and Jiang Bian",
note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 6th Annual IEEE Conference on Games, CoG 2024 ; Conference date: 05-08-2024 Through 08-08-2024",
year = "2024",
month = aug,
day = "5",
doi = "10.48550/arXiv.2404.09715",
language = "English",
isbn = "979-8-3503-5068-5",
series = "IEEE Conference on Computatonal Intelligence and Games, CIG",
publisher = "IEEE Computer Society",
booktitle = "Proceedings of the 2024 IEEE Conference on Games, CoG 2024",
address = "United States",

}

Download

TY - GEN

T1 - Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

AU - Xu, Linjie

AU - Liu, Zichuan

AU - Dockhorn, Alexander

AU - Perez-Liebana, Diego

AU - Wang, Jinyu

AU - Song, Lei

AU - Bian, Jiang

N1 - Publisher Copyright: © 2024 IEEE.

PY - 2024/8/5

Y1 - 2024/8/5

N2 - One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate 3 MARL methods on 6 SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://github.com/egg-west/rr-for-MARL.

AB - One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate 3 MARL methods on 6 SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://github.com/egg-west/rr-for-MARL.

KW - Multi-Agent Reinforcement Learning

KW - Reinforcement Learning

KW - Sample efficiency

KW - Starcraft II

UR - http://www.scopus.com/inward/record.url?scp=85203526586&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2404.09715

DO - 10.48550/arXiv.2404.09715

M3 - Conference contribution

AN - SCOPUS:85203526586

SN - 979-8-3503-5068-5

T3 - IEEE Conference on Computatonal Intelligence and Games, CIG

BT - Proceedings of the 2024 IEEE Conference on Games, CoG 2024

PB - IEEE Computer Society

T2 - 6th Annual IEEE Conference on Games, CoG 2024

Y2 - 5 August 2024 through 8 August 2024

ER -

By the same author(s)