摘要

Based on the adaptive reward-shaping mechanism, we propose a novel gradient descent (GD) Sarsa(lambda) algorithm to solve the problems of ill initial performance and low convergence speed in the reinforcement learning tasks with continuous state space. Adaptive normalized radial basis ANRBF) network is used to shape reward. The reward-shaping mechanism propagates model knowledge to the learner in the form of the additional reward signal so that the initial performance and convergence speed can be improved effectively. A function approximation algorithm named ANRBF-GD-Sarsa(lambda) is proposed based on the ANRBF network. The convergence of ANRBF-GD-Sarsa(lambda) is analyzed theoretically. Experiments are conducted to show the good initial performance and high convergence speed of the proposed algorithm.

全文