Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

Externe Organisationen

  • Queen Mary University of London
  • Nanjing University
  • Microsoft Corporation
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksProceedings of the 2024 IEEE Conference on Games, CoG 2024
Herausgeber (Verlag)IEEE Computer Society
ISBN (elektronisch)9798350350678
ISBN (Print)979-8-3503-5068-5
PublikationsstatusVeröffentlicht - 5 Aug. 2024
Veranstaltung6th Annual IEEE Conference on Games, CoG 2024 - Milan, Italien
Dauer: 5 Aug. 20248 Aug. 2024

Publikationsreihe

NameIEEE Conference on Computatonal Intelligence and Games, CIG
ISSN (Print)2325-4270
ISSN (elektronisch)2325-4289

Abstract

One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate 3 MARL methods on 6 SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://github.com/egg-west/rr-for-MARL.

ASJC Scopus Sachgebiete

Zitieren

Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. / Xu, Linjie; Liu, Zichuan; Dockhorn, Alexander et al.
Proceedings of the 2024 IEEE Conference on Games, CoG 2024. IEEE Computer Society, 2024. (IEEE Conference on Computatonal Intelligence and Games, CIG).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Xu, L, Liu, Z, Dockhorn, A, Perez-Liebana, D, Wang, J, Song, L & Bian, J 2024, Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. in Proceedings of the 2024 IEEE Conference on Games, CoG 2024. IEEE Conference on Computatonal Intelligence and Games, CIG, IEEE Computer Society, 6th Annual IEEE Conference on Games, CoG 2024, Milan, Italien, 5 Aug. 2024. https://doi.org/10.48550/arXiv.2404.09715, https://doi.org/10.1109/CoG60054.2024.10645658
Xu, L., Liu, Z., Dockhorn, A., Perez-Liebana, D., Wang, J., Song, L., & Bian, J. (2024). Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. In Proceedings of the 2024 IEEE Conference on Games, CoG 2024 (IEEE Conference on Computatonal Intelligence and Games, CIG). IEEE Computer Society. https://doi.org/10.48550/arXiv.2404.09715, https://doi.org/10.1109/CoG60054.2024.10645658
Xu L, Liu Z, Dockhorn A, Perez-Liebana D, Wang J, Song L et al. Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. in Proceedings of the 2024 IEEE Conference on Games, CoG 2024. IEEE Computer Society. 2024. (IEEE Conference on Computatonal Intelligence and Games, CIG). doi: 10.48550/arXiv.2404.09715, 10.1109/CoG60054.2024.10645658
Xu, Linjie ; Liu, Zichuan ; Dockhorn, Alexander et al. / Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. Proceedings of the 2024 IEEE Conference on Games, CoG 2024. IEEE Computer Society, 2024. (IEEE Conference on Computatonal Intelligence and Games, CIG).
Download
@inproceedings{8dee49296a804a38ad3697efe4e17ea6,
title = "Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning",
abstract = "One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate 3 MARL methods on 6 SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://github.com/egg-west/rr-for-MARL.",
keywords = "Multi-Agent Reinforcement Learning, Reinforcement Learning, Sample efficiency, Starcraft II",
author = "Linjie Xu and Zichuan Liu and Alexander Dockhorn and Diego Perez-Liebana and Jinyu Wang and Lei Song and Jiang Bian",
note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 6th Annual IEEE Conference on Games, CoG 2024 ; Conference date: 05-08-2024 Through 08-08-2024",
year = "2024",
month = aug,
day = "5",
doi = "10.48550/arXiv.2404.09715",
language = "English",
isbn = "979-8-3503-5068-5",
series = "IEEE Conference on Computatonal Intelligence and Games, CIG",
publisher = "IEEE Computer Society",
booktitle = "Proceedings of the 2024 IEEE Conference on Games, CoG 2024",
address = "United States",

}

Download

TY - GEN

T1 - Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

AU - Xu, Linjie

AU - Liu, Zichuan

AU - Dockhorn, Alexander

AU - Perez-Liebana, Diego

AU - Wang, Jinyu

AU - Song, Lei

AU - Bian, Jiang

N1 - Publisher Copyright: © 2024 IEEE.

PY - 2024/8/5

Y1 - 2024/8/5

N2 - One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate 3 MARL methods on 6 SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://github.com/egg-west/rr-for-MARL.

AB - One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate 3 MARL methods on 6 SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://github.com/egg-west/rr-for-MARL.

KW - Multi-Agent Reinforcement Learning

KW - Reinforcement Learning

KW - Sample efficiency

KW - Starcraft II

UR - http://www.scopus.com/inward/record.url?scp=85203526586&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2404.09715

DO - 10.48550/arXiv.2404.09715

M3 - Conference contribution

AN - SCOPUS:85203526586

SN - 979-8-3503-5068-5

T3 - IEEE Conference on Computatonal Intelligence and Games, CIG

BT - Proceedings of the 2024 IEEE Conference on Games, CoG 2024

PB - IEEE Computer Society

T2 - 6th Annual IEEE Conference on Games, CoG 2024

Y2 - 5 August 2024 through 8 August 2024

ER -

Von denselben Autoren