摘要

In this study, the optimal tracking control problem (OTCP) for affine non-linear continuous-time systems with completely unknown dynamics is addressed based on data by introducing the reinforcement learning (RL) technique. Unlike existing methods to the OTCP, the proposed data-driven policy iteration (PI) method does not need to have or identify any knowledge of the system dynamics, including both drift dynamics and input dynamics. To carry out the proposed method, the original OTCP is pre-processed to construct an augmented system composed of the error system dynamics and the desired trajectory dynamics. Then, based on the augmented system, a data-driven PI, which introduces discount factor to solve the OTCP, is implemented on an actor-critic neural network (NN) structure by only using system data rather than the exact knowledge of system dynamics. Two NNs are used in the structure to generate the optimal cost and optimal control policy, respectively, and the weights are updated by a least-square approach which minimises the residual errors. The proposed method is an off-policy RL method, where the data can be arbitrarily sampled on the state and input domain. Finally, simulation results are provided to show the effectiveness of the proposed method.