A Parallel Framework of Adaptive Dynamic Programming Algorithm With Off-Policy Learning.

Clicks: 217
ID: 110447
2020
Article Quality & Performance Metrics
Overall Quality Improving Quality
0.0 /100
Combines engagement data with AI-assessed academic quality
AI Quality Assessment
Not analyzed
Abstract
In this article, a model-free online adaptive dynamic programming (ADP) approach is developed for solving the optimal control problem of nonaffine nonlinear systems. Combining the off-policy learning mechanism with the parallel paradigm, multithread agents are employed to collect the transitions by interacting with the environment that significantly augments the number of sampled data. On the other hand, each thread agent explores the environment with different initial states under its own behavior policy that enhances the exploration capability and alleviates the correlation between the sampled data. After the policy evaluation process, only one step update is required for policy improvement based on the policy gradient method. The stability of the system under iterative control laws is guaranteed. Moreover, the convergence analysis is given to prove that the iterative Q-function is monotonically nonincreasing and finally converges to the solution of the Hamilton-Jacobi-Bellman (HJB) equation. For implementing the algorithm, the actor-critic (AC) structure is utilized with two neural networks (NNs) to approximate the Q-function and the control policy. Finally, the effectiveness of the proposed algorithm is verified by two numerical examples.
Reference Key
sun2020aieee Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors Sun, Changyin;Li, Xiaofeng;Sun, Yuewen;
Journal IEEE Transactions on Neural Networks and Learning Systems
Year 2020
DOI 10.1109/TNNLS.2020.3015767
URL
Keywords

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

No comments yet. Be the first to comment on this article.