Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Linjie Xu; Zichuan Liu; Alexander Dockhorn; Diego Perez-Liebana; Jinyu Wang; Lei Song; Jiang Bian

doi:10.48550/arXiv.2404.09715

Details

Originalsprache	Englisch
Titel des Sammelwerks	Proceedings of the 2024 IEEE Conference on Games, CoG 2024
Herausgeber (Verlag)	IEEE Computer Society
ISBN (elektronisch)	9798350350678
ISBN (Print)	979-8-3503-5068-5
Publikationsstatus	Veröffentlicht - 5 Aug. 2024
Veranstaltung	6th Annual IEEE Conference on Games, CoG 2024 - Milan, Italien Dauer: 5 Aug. 2024 → 8 Aug. 2024

Publikationsreihe

Name	IEEE Conference on Computatonal Intelligence and Games, CIG
ISSN (Print)	2325-4270
ISSN (elektronisch)	2325-4289

Abstract

One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate 3 MARL methods on 6 SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://github.com/egg-west/rr-for-MARL.

ASJC Scopus Sachgebiete

Informatik (insg.)
Artificial intelligence
Informatik (insg.)
Computergrafik und computergestütztes Design
Informatik (insg.)
Maschinelles Sehen und Mustererkennung
Informatik (insg.)
Mensch-Maschine-Interaktion
Informatik (insg.)
Software

Zitieren

Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. / Xu, Linjie; Liu, Zichuan; Dockhorn, Alexander et al.
Proceedings of the 2024 IEEE Conference on Games, CoG 2024. IEEE Computer Society, 2024. (IEEE Conference on Computatonal Intelligence and Games, CIG).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Xu, L, Liu, Z, Dockhorn, A, Perez-Liebana, D, Wang, J, Song, L & Bian, J 2024, Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. in Proceedings of the 2024 IEEE Conference on Games, CoG 2024. IEEE Conference on Computatonal Intelligence and Games, CIG, IEEE Computer Society, 6th Annual IEEE Conference on Games, CoG 2024, Milan, Italien, 5 Aug. 2024. https://doi.org/10.48550/arXiv.2404.09715, https://doi.org/10.1109/CoG60054.2024.10645658

Xu, L., Liu, Z., Dockhorn, A., Perez-Liebana, D., Wang, J., Song, L., & Bian, J. (2024). Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. In Proceedings of the 2024 IEEE Conference on Games, CoG 2024 (IEEE Conference on Computatonal Intelligence and Games, CIG). IEEE Computer Society. https://doi.org/10.48550/arXiv.2404.09715, https://doi.org/10.1109/CoG60054.2024.10645658

Xu L, Liu Z, Dockhorn A, Perez-Liebana D, Wang J, Song L et al. Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. in Proceedings of the 2024 IEEE Conference on Games, CoG 2024. IEEE Computer Society. 2024. (IEEE Conference on Computatonal Intelligence and Games, CIG). doi: 10.48550/arXiv.2404.09715, 10.1109/CoG60054.2024.10645658

Xu, Linjie ; Liu, Zichuan ; Dockhorn, Alexander et al. / Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning. Proceedings of the 2024 IEEE Conference on Games, CoG 2024. IEEE Computer Society, 2024. (IEEE Conference on Computatonal Intelligence and Games, CIG).

Download

@inproceedings{8dee49296a804a38ad3697efe4e17ea6,

title = "Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning",

abstract = "One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate 3 MARL methods on 6 SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://github.com/egg-west/rr-for-MARL.",

keywords = "Multi-Agent Reinforcement Learning, Reinforcement Learning, Sample efficiency, Starcraft II",

author = "Linjie Xu and Zichuan Liu and Alexander Dockhorn and Diego Perez-Liebana and Jinyu Wang and Lei Song and Jiang Bian",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 6th Annual IEEE Conference on Games, CoG 2024 ; Conference date: 05-08-2024 Through 08-08-2024",

year = "2024",

month = aug,

day = "5",

doi = "10.48550/arXiv.2404.09715",

language = "English",

isbn = "979-8-3503-5068-5",

series = "IEEE Conference on Computatonal Intelligence and Games, CIG",

publisher = "IEEE Computer Society",

booktitle = "Proceedings of the 2024 IEEE Conference on Games, CoG 2024",

address = "United States",

}

Download

TY - GEN

T1 - Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

AU - Xu, Linjie

AU - Liu, Zichuan

AU - Dockhorn, Alexander

AU - Perez-Liebana, Diego

AU - Wang, Jinyu

AU - Song, Lei

AU - Bian, Jiang

PY - 2024/8/5

Y1 - 2024/8/5

N2 - One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate 3 MARL methods on 6 SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://github.com/egg-west/rr-for-MARL.

AB - One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate 3 MARL methods on 6 SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://github.com/egg-west/rr-for-MARL.

KW - Multi-Agent Reinforcement Learning

KW - Reinforcement Learning

KW - Sample efficiency

KW - Starcraft II

UR - http://www.scopus.com/inward/record.url?scp=85203526586&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2404.09715

DO - 10.48550/arXiv.2404.09715

M3 - Conference contribution

AN - SCOPUS:85203526586

SN - 979-8-3503-5068-5

T3 - IEEE Conference on Computatonal Intelligence and Games, CIG

BT - Proceedings of the 2024 IEEE Conference on Games, CoG 2024

PB - IEEE Computer Society

T2 - 6th Annual IEEE Conference on Games, CoG 2024

Y2 - 5 August 2024 through 8 August 2024

ER -

Research@Leibniz University

Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Match Point AI: A Novel AI Framework for Evaluating Data-Driven Tennis Strategies

Online Optimization of Curriculum Learning Schedules using Evolutionary Optimization

Personalized Dynamic Difficulty Adjustment Imitation Learning Meets Reinforcement Learning

Strategy Game-Playing with Size-Constrained State Abstraction

Markov Senior: Learning Markov Junior Grammars to Generate User-specified Content

Match Point AI: A Novel AI Framework for Evaluating Data-Driven Tennis Strategies

Online Optimization of Curriculum Learning Schedules using Evolutionary Optimization

Personalized Dynamic Difficulty Adjustment Imitation Learning Meets Reinforcement Learning

Strategy Game-Playing with Size-Constrained State Abstraction

Markov Senior: Learning Markov Junior Grammars to Generate User-specified Content

Match Point AI: A Novel AI Framework for Evaluating Data-Driven Tennis Strategies