Data-Driven H Optimal Output Feedback Control for Linear Discrete-Time Systems Based on Off-Policy Q-Learning

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Li Zhang
  • Jialu Fan
  • Wenqian Xue
  • Victor G. Lopez
  • Jinna Li
  • Tianyou Chai
  • Frank L. Lewis

Research Organisations

External Research Organisations

  • Northeastern University, Shenyang (NEU)
  • Liaoning Petrochemical University
  • University of Texas at Arlington
View graph of relations

Details

Original languageEnglish
Pages (from-to)3553-3567
Number of pages15
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume34
Issue number7
Publication statusPublished - 18 Oct 2021

Abstract

This article develops two novel output feedback (OPFB) Q-learning algorithms, on-policy Q-learning and off-policy Q-learning, to solve H∞ static OPFB control problem of linear discrete-time (DT) systems. The primary contribution of the proposed algorithms lies in a newly developed OPFB control algorithm form for completely unknown systems. Under the premise of satisfying disturbance attenuation conditions, the conditions for the existence of the optimal OPFB solution are given. The convergence of the proposed Q-learning methods, and the difference and equivalence of two algorithms are rigorously proven. Moreover, considering the effects brought by probing noise for the persistence of excitation (PE), the proposed off-policy Q-learning method has the advantage of being immune to probing noise and avoiding biasedness of solution. Simulation results are presented to verify the effectiveness of the proposed approaches.

Keywords

    Hcontrol, off-policy Q-learning, Q-learning, static output feedback (OPFB), zero-sum game

ASJC Scopus subject areas

Cite this

Data-Driven H Optimal Output Feedback Control for Linear Discrete-Time Systems Based on Off-Policy Q-Learning. / Zhang, Li; Fan, Jialu; Xue, Wenqian et al.
In: IEEE Transactions on Neural Networks and Learning Systems, Vol. 34, No. 7, 18.10.2021, p. 3553-3567.

Research output: Contribution to journalArticleResearchpeer review

Zhang L, Fan J, Xue W, Lopez VG, Li J, Chai T et al. Data-Driven H Optimal Output Feedback Control for Linear Discrete-Time Systems Based on Off-Policy Q-Learning. IEEE Transactions on Neural Networks and Learning Systems. 2021 Oct 18;34(7):3553-3567. doi: 10.1109/TNNLS.2021.3112457
Download
@article{d9bc7c1487d0485c852325f4a6c51e33,
title = "Data-Driven H∞ Optimal Output Feedback Control for Linear Discrete-Time Systems Based on Off-Policy Q-Learning",
abstract = "This article develops two novel output feedback (OPFB) Q-learning algorithms, on-policy Q-learning and off-policy Q-learning, to solve H∞ static OPFB control problem of linear discrete-time (DT) systems. The primary contribution of the proposed algorithms lies in a newly developed OPFB control algorithm form for completely unknown systems. Under the premise of satisfying disturbance attenuation conditions, the conditions for the existence of the optimal OPFB solution are given. The convergence of the proposed Q-learning methods, and the difference and equivalence of two algorithms are rigorously proven. Moreover, considering the effects brought by probing noise for the persistence of excitation (PE), the proposed off-policy Q-learning method has the advantage of being immune to probing noise and avoiding biasedness of solution. Simulation results are presented to verify the effectiveness of the proposed approaches.",
keywords = "Hcontrol, off-policy Q-learning, Q-learning, static output feedback (OPFB), zero-sum game",
author = "Li Zhang and Jialu Fan and Wenqian Xue and Lopez, {Victor G.} and Jinna Li and Tianyou Chai and Lewis, {Frank L.}",
note = "Funding Information: This work was supported in part by the NSFC under Grant 61991400, Grant 61991404, Grant 61533015, and Grant 62073158; in part by the 2020 Science and Technology Major Project of Liaoning Province under Grant 2020JH1/10100008; and in part by the Liaoning Revitalization Talents Program under Grant XLYC2007135.",
year = "2021",
month = oct,
day = "18",
doi = "10.1109/TNNLS.2021.3112457",
language = "English",
volume = "34",
pages = "3553--3567",
journal = "IEEE Transactions on Neural Networks and Learning Systems",
issn = "2162-237X",
publisher = "IEEE Computational Intelligence Society",
number = "7",

}

Download

TY - JOUR

T1 - Data-Driven H∞ Optimal Output Feedback Control for Linear Discrete-Time Systems Based on Off-Policy Q-Learning

AU - Zhang, Li

AU - Fan, Jialu

AU - Xue, Wenqian

AU - Lopez, Victor G.

AU - Li, Jinna

AU - Chai, Tianyou

AU - Lewis, Frank L.

N1 - Funding Information: This work was supported in part by the NSFC under Grant 61991400, Grant 61991404, Grant 61533015, and Grant 62073158; in part by the 2020 Science and Technology Major Project of Liaoning Province under Grant 2020JH1/10100008; and in part by the Liaoning Revitalization Talents Program under Grant XLYC2007135.

PY - 2021/10/18

Y1 - 2021/10/18

N2 - This article develops two novel output feedback (OPFB) Q-learning algorithms, on-policy Q-learning and off-policy Q-learning, to solve H∞ static OPFB control problem of linear discrete-time (DT) systems. The primary contribution of the proposed algorithms lies in a newly developed OPFB control algorithm form for completely unknown systems. Under the premise of satisfying disturbance attenuation conditions, the conditions for the existence of the optimal OPFB solution are given. The convergence of the proposed Q-learning methods, and the difference and equivalence of two algorithms are rigorously proven. Moreover, considering the effects brought by probing noise for the persistence of excitation (PE), the proposed off-policy Q-learning method has the advantage of being immune to probing noise and avoiding biasedness of solution. Simulation results are presented to verify the effectiveness of the proposed approaches.

AB - This article develops two novel output feedback (OPFB) Q-learning algorithms, on-policy Q-learning and off-policy Q-learning, to solve H∞ static OPFB control problem of linear discrete-time (DT) systems. The primary contribution of the proposed algorithms lies in a newly developed OPFB control algorithm form for completely unknown systems. Under the premise of satisfying disturbance attenuation conditions, the conditions for the existence of the optimal OPFB solution are given. The convergence of the proposed Q-learning methods, and the difference and equivalence of two algorithms are rigorously proven. Moreover, considering the effects brought by probing noise for the persistence of excitation (PE), the proposed off-policy Q-learning method has the advantage of being immune to probing noise and avoiding biasedness of solution. Simulation results are presented to verify the effectiveness of the proposed approaches.

KW - Hcontrol

KW - off-policy Q-learning

KW - Q-learning

KW - static output feedback (OPFB)

KW - zero-sum game

UR - http://www.scopus.com/inward/record.url?scp=85164272276&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2021.3112457

DO - 10.1109/TNNLS.2021.3112457

M3 - Article

C2 - 34662280

AN - SCOPUS:85164272276

VL - 34

SP - 3553

EP - 3567

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

SN - 2162-237X

IS - 7

ER -