摘要

In recent years, reinforcement learning (RL) and approximate dynamic programming (ADP) have been widely studied in the community of artificial intelligence and machine learning. As an important class of RL and ADP methods, adaptive critic designs (ACDs) with function approximation have been studied to realize online learning control of nonlinear dynamical systems. However, how to construct efficient feature representations for approximating value functions or policies is still a difficult problem. In this paper, ACDs with graph Laplacian (GL) are proposed by integrating manifold learning methods into feature representations of ACDs. An online learning control algorithm called graph Laplacian dual heuristic programming (GL-DHP) is presented, and its performance is analyzed both theoretically and empirically. Because of the nonlinear approximation ability of feature representation with GL, the GL-DHP method has much better performance than previous DHP methods with manually designed neural networks. Simulation results on learning control of a ball and plate system, which is a typical nonlinear dynamical system with continuous state and action spaces, demonstrate the effectiveness of the GL-DHP method.