Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Proceedings of the international conference on machine learning (ICML) |
Seitenumfang | 18 |
Publikationsstatus | Elektronisch veröffentlicht (E-Pub) - 2021 |
Veranstaltung | 38th International Conference on Machine Learning Research - Virtual Dauer: 18 Juli 2021 → 24 Juli 2021 |
Abstract
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
Proceedings of the international conference on machine learning (ICML). 2021.
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - TempoRL: Learning When to Act
AU - Biedenkapp, André
AU - Rajan, Raghu
AU - Hutter, Frank
AU - Lindauer, Marius
N1 - Accepted at ICML'21
PY - 2021
Y1 - 2021
N2 - Reinforcement learning is a powerful approach to learn behaviour through interactions with an environment. However, behaviours are usually learned in a purely reactive fashion, where an appropriate action is selected based on an observation. In this form, it is challenging to learn when it is necessary to execute new decisions. This makes learning inefficient, especially in environments that need various degrees of fine and coarse control. To address this, we propose a proactive setting in which the agent not only selects an action in a state but also for how long to commit to that action. Our TempoRL approach introduces skip connections between states and learns a skip-policy for repeating the same action along these skips. We demonstrate the effectiveness of TempoRL on a variety of traditional and deep RL environments, showing that our approach is capable of learning successful policies up to an order of magnitude faster than vanilla Q-learning.
AB - Reinforcement learning is a powerful approach to learn behaviour through interactions with an environment. However, behaviours are usually learned in a purely reactive fashion, where an appropriate action is selected based on an observation. In this form, it is challenging to learn when it is necessary to execute new decisions. This makes learning inefficient, especially in environments that need various degrees of fine and coarse control. To address this, we propose a proactive setting in which the agent not only selects an action in a state but also for how long to commit to that action. Our TempoRL approach introduces skip connections between states and learns a skip-policy for repeating the same action along these skips. We demonstrate the effectiveness of TempoRL on a variety of traditional and deep RL environments, showing that our approach is capable of learning successful policies up to an order of magnitude faster than vanilla Q-learning.
KW - cs.LG
M3 - Conference contribution
BT - Proceedings of the international conference on machine learning (ICML)
T2 - 38th International Conference on Machine Learning Research
Y2 - 18 July 2021 through 24 July 2021
ER -