An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

Victor G. Lopez; Matthias A. Müller

doi:10.48550/arXiv.2303.17819

Details

Originalsprache	Englisch
Titel des Sammelwerks	2023 62nd IEEE Conference on Decision and Control, CDC 2023
Herausgeber (Verlag)	Institute of Electrical and Electronics Engineers Inc.
Seiten	13-19
Seitenumfang	7
ISBN (elektronisch)	9798350301243
Publikationsstatus	Veröffentlicht - 2023
Veranstaltung	62nd IEEE Conference on Decision and Control, CDC 2023 - Singapore, Singapur Dauer: 13 Dez. 2023 → 15 Dez. 2023

Publikationsreihe

Name	Proceedings of the IEEE Conference on Decision and Control
ISSN (Print)	0743-1546
ISSN (elektronisch)	2576-2370

Abstract

In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.

ASJC Scopus Sachgebiete

Ingenieurwesen (insg.)
Steuerungs- und Systemtechnik
Mathematik (insg.)
Modellierung und Simulation
Mathematik (insg.)
Steuerung und Optimierung

Zitieren

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. / Lopez, Victor G.; Müller, Matthias A.
2023 62nd IEEE Conference on Decision and Control, CDC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. S. 13-19 (Proceedings of the IEEE Conference on Decision and Control).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Lopez, VG & Müller, MA 2023, An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. in 2023 62nd IEEE Conference on Decision and Control, CDC 2023. Proceedings of the IEEE Conference on Decision and Control, Institute of Electrical and Electronics Engineers Inc., S. 13-19, 62nd IEEE Conference on Decision and Control, CDC 2023, Singapore, Singapur, 13 Dez. 2023. https://doi.org/10.48550/arXiv.2303.17819, https://doi.org/10.1109/CDC49753.2023.10384256

Lopez, V. G., & Müller, M. A. (2023). An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. In 2023 62nd IEEE Conference on Decision and Control, CDC 2023 (S. 13-19). (Proceedings of the IEEE Conference on Decision and Control). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.48550/arXiv.2303.17819, https://doi.org/10.1109/CDC49753.2023.10384256

Lopez VG, Müller MA. An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. in 2023 62nd IEEE Conference on Decision and Control, CDC 2023. Institute of Electrical and Electronics Engineers Inc. 2023. S. 13-19. (Proceedings of the IEEE Conference on Decision and Control). doi: 10.48550/arXiv.2303.17819, 10.1109/CDC49753.2023.10384256

Lopez, Victor G. ; Müller, Matthias A. / An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. 2023 62nd IEEE Conference on Decision and Control, CDC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. S. 13-19 (Proceedings of the IEEE Conference on Decision and Control).

Download

@inproceedings{9a6ff5fd88d14fc686e05b4765efb615,

title = "An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem",

abstract = "In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.",

author = "Lopez, {Victor G.} and M{\"u}ller, {Matthias A.}",

year = "2023",

doi = "10.48550/arXiv.2303.17819",

language = "English",

series = "Proceedings of the IEEE Conference on Decision and Control",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "13--19",

booktitle = "2023 62nd IEEE Conference on Decision and Control, CDC 2023",

address = "United States",

note = "62nd IEEE Conference on Decision and Control, CDC 2023 ; Conference date: 13-12-2023 Through 15-12-2023",

}

Download

TY - GEN

T1 - An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

AU - Lopez, Victor G.

AU - Müller, Matthias A.

PY - 2023

Y1 - 2023

N2 - In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.

AB - In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.

UR - http://www.scopus.com/inward/record.url?scp=85184817776&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2303.17819

DO - 10.48550/arXiv.2303.17819

M3 - Conference contribution

AN - SCOPUS:85184817776

T3 - Proceedings of the IEEE Conference on Decision and Control

SP - 13

EP - 19

BT - 2023 62nd IEEE Conference on Decision and Control, CDC 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 62nd IEEE Conference on Decision and Control, CDC 2023

Y2 - 13 December 2023 through 15 December 2023

ER -

Research@Leibniz University

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

Autoren

Organisationseinheiten

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Identification from data with periodically missing output samples

Model Predictive Temperature Control for Retinal Laser Treatments

Robust Data-Driven Moving Horizon Estimation for Linear Discrete-Time Systems

On Stabilizing Terminal Costs and Regions for Configuration-Constrained Tube MPC

An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems