Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | 2023 62nd IEEE Conference on Decision and Control, CDC 2023 |
Herausgeber (Verlag) | Institute of Electrical and Electronics Engineers Inc. |
Seiten | 13-19 |
Seitenumfang | 7 |
ISBN (elektronisch) | 9798350301243 |
Publikationsstatus | Veröffentlicht - 2023 |
Veranstaltung | 62nd IEEE Conference on Decision and Control, CDC 2023 - Singapore, Singapur Dauer: 13 Dez. 2023 → 15 Dez. 2023 |
Publikationsreihe
Name | Proceedings of the IEEE Conference on Decision and Control |
---|---|
ISSN (Print) | 0743-1546 |
ISSN (elektronisch) | 2576-2370 |
Abstract
In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.
ASJC Scopus Sachgebiete
- Ingenieurwesen (insg.)
- Steuerungs- und Systemtechnik
- Mathematik (insg.)
- Modellierung und Simulation
- Mathematik (insg.)
- Steuerung und Optimierung
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
2023 62nd IEEE Conference on Decision and Control, CDC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. S. 13-19 (Proceedings of the IEEE Conference on Decision and Control).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem
AU - Lopez, Victor G.
AU - Müller, Matthias A.
PY - 2023
Y1 - 2023
N2 - In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.
AB - In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.
UR - http://www.scopus.com/inward/record.url?scp=85184817776&partnerID=8YFLogxK
U2 - 10.48550/arXiv.2303.17819
DO - 10.48550/arXiv.2303.17819
M3 - Conference contribution
AN - SCOPUS:85184817776
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 13
EP - 19
BT - 2023 62nd IEEE Conference on Decision and Control, CDC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 62nd IEEE Conference on Decision and Control, CDC 2023
Y2 - 13 December 2023 through 15 December 2023
ER -