An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

Organisationseinheiten

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des Sammelwerks2023 62nd IEEE Conference on Decision and Control, CDC 2023
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
Seiten13-19
Seitenumfang7
ISBN (elektronisch)9798350301243
PublikationsstatusVeröffentlicht - 2023
Veranstaltung62nd IEEE Conference on Decision and Control, CDC 2023 - Singapore, Singapur
Dauer: 13 Dez. 202315 Dez. 2023

Publikationsreihe

NameProceedings of the IEEE Conference on Decision and Control
ISSN (Print)0743-1546
ISSN (elektronisch)2576-2370

Abstract

In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.

ASJC Scopus Sachgebiete

Zitieren

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. / Lopez, Victor G.; Müller, Matthias A.
2023 62nd IEEE Conference on Decision and Control, CDC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. S. 13-19 (Proceedings of the IEEE Conference on Decision and Control).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Lopez, VG & Müller, MA 2023, An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. in 2023 62nd IEEE Conference on Decision and Control, CDC 2023. Proceedings of the IEEE Conference on Decision and Control, Institute of Electrical and Electronics Engineers Inc., S. 13-19, 62nd IEEE Conference on Decision and Control, CDC 2023, Singapore, Singapur, 13 Dez. 2023. https://doi.org/10.48550/arXiv.2303.17819, https://doi.org/10.1109/CDC49753.2023.10384256
Lopez, V. G., & Müller, M. A. (2023). An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. In 2023 62nd IEEE Conference on Decision and Control, CDC 2023 (S. 13-19). (Proceedings of the IEEE Conference on Decision and Control). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.48550/arXiv.2303.17819, https://doi.org/10.1109/CDC49753.2023.10384256
Lopez VG, Müller MA. An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. in 2023 62nd IEEE Conference on Decision and Control, CDC 2023. Institute of Electrical and Electronics Engineers Inc. 2023. S. 13-19. (Proceedings of the IEEE Conference on Decision and Control). doi: 10.48550/arXiv.2303.17819, 10.1109/CDC49753.2023.10384256
Lopez, Victor G. ; Müller, Matthias A. / An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. 2023 62nd IEEE Conference on Decision and Control, CDC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. S. 13-19 (Proceedings of the IEEE Conference on Decision and Control).
Download
@inproceedings{9a6ff5fd88d14fc686e05b4765efb615,
title = "An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem",
abstract = "In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.",
author = "Lopez, {Victor G.} and M{\"u}ller, {Matthias A.}",
year = "2023",
doi = "10.48550/arXiv.2303.17819",
language = "English",
series = "Proceedings of the IEEE Conference on Decision and Control",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "13--19",
booktitle = "2023 62nd IEEE Conference on Decision and Control, CDC 2023",
address = "United States",
note = "62nd IEEE Conference on Decision and Control, CDC 2023 ; Conference date: 13-12-2023 Through 15-12-2023",

}

Download

TY - GEN

T1 - An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

AU - Lopez, Victor G.

AU - Müller, Matthias A.

PY - 2023

Y1 - 2023

N2 - In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.

AB - In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.

UR - http://www.scopus.com/inward/record.url?scp=85184817776&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2303.17819

DO - 10.48550/arXiv.2303.17819

M3 - Conference contribution

AN - SCOPUS:85184817776

T3 - Proceedings of the IEEE Conference on Decision and Control

SP - 13

EP - 19

BT - 2023 62nd IEEE Conference on Decision and Control, CDC 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 62nd IEEE Conference on Decision and Control, CDC 2023

Y2 - 13 December 2023 through 15 December 2023

ER -

Von denselben Autoren