Details
Original language | English |
---|---|
Pages (from-to) | 3553-3567 |
Number of pages | 15 |
Journal | IEEE Transactions on Neural Networks and Learning Systems |
Volume | 34 |
Issue number | 7 |
Publication status | Published - 18 Oct 2021 |
Abstract
This article develops two novel output feedback (OPFB) Q-learning algorithms, on-policy Q-learning and off-policy Q-learning, to solve H∞ static OPFB control problem of linear discrete-time (DT) systems. The primary contribution of the proposed algorithms lies in a newly developed OPFB control algorithm form for completely unknown systems. Under the premise of satisfying disturbance attenuation conditions, the conditions for the existence of the optimal OPFB solution are given. The convergence of the proposed Q-learning methods, and the difference and equivalence of two algorithms are rigorously proven. Moreover, considering the effects brought by probing noise for the persistence of excitation (PE), the proposed off-policy Q-learning method has the advantage of being immune to probing noise and avoiding biasedness of solution. Simulation results are presented to verify the effectiveness of the proposed approaches.
Keywords
- Hcontrol, off-policy Q-learning, Q-learning, static output feedback (OPFB), zero-sum game
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Computer Science(all)
- Computer Science Applications
- Computer Science(all)
- Computer Networks and Communications
- Computer Science(all)
- Artificial Intelligence
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: IEEE Transactions on Neural Networks and Learning Systems, Vol. 34, No. 7, 18.10.2021, p. 3553-3567.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Data-Driven H∞ Optimal Output Feedback Control for Linear Discrete-Time Systems Based on Off-Policy Q-Learning
AU - Zhang, Li
AU - Fan, Jialu
AU - Xue, Wenqian
AU - Lopez, Victor G.
AU - Li, Jinna
AU - Chai, Tianyou
AU - Lewis, Frank L.
N1 - Funding Information: This work was supported in part by the NSFC under Grant 61991400, Grant 61991404, Grant 61533015, and Grant 62073158; in part by the 2020 Science and Technology Major Project of Liaoning Province under Grant 2020JH1/10100008; and in part by the Liaoning Revitalization Talents Program under Grant XLYC2007135.
PY - 2021/10/18
Y1 - 2021/10/18
N2 - This article develops two novel output feedback (OPFB) Q-learning algorithms, on-policy Q-learning and off-policy Q-learning, to solve H∞ static OPFB control problem of linear discrete-time (DT) systems. The primary contribution of the proposed algorithms lies in a newly developed OPFB control algorithm form for completely unknown systems. Under the premise of satisfying disturbance attenuation conditions, the conditions for the existence of the optimal OPFB solution are given. The convergence of the proposed Q-learning methods, and the difference and equivalence of two algorithms are rigorously proven. Moreover, considering the effects brought by probing noise for the persistence of excitation (PE), the proposed off-policy Q-learning method has the advantage of being immune to probing noise and avoiding biasedness of solution. Simulation results are presented to verify the effectiveness of the proposed approaches.
AB - This article develops two novel output feedback (OPFB) Q-learning algorithms, on-policy Q-learning and off-policy Q-learning, to solve H∞ static OPFB control problem of linear discrete-time (DT) systems. The primary contribution of the proposed algorithms lies in a newly developed OPFB control algorithm form for completely unknown systems. Under the premise of satisfying disturbance attenuation conditions, the conditions for the existence of the optimal OPFB solution are given. The convergence of the proposed Q-learning methods, and the difference and equivalence of two algorithms are rigorously proven. Moreover, considering the effects brought by probing noise for the persistence of excitation (PE), the proposed off-policy Q-learning method has the advantage of being immune to probing noise and avoiding biasedness of solution. Simulation results are presented to verify the effectiveness of the proposed approaches.
KW - Hcontrol
KW - off-policy Q-learning
KW - Q-learning
KW - static output feedback (OPFB)
KW - zero-sum game
UR - http://www.scopus.com/inward/record.url?scp=85164272276&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2021.3112457
DO - 10.1109/TNNLS.2021.3112457
M3 - Article
C2 - 34662280
AN - SCOPUS:85164272276
VL - 34
SP - 3553
EP - 3567
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
SN - 2162-237X
IS - 7
ER -