A GRADIENT DESCENT SARSA(lambda) ALGORITHM BASED ON THE ADAPTIVE REWARD-SHAPING MECHANISM

Liu, Quan<sup>*</sup>; Fu, QiMing; Xiao, Fei; Fu, YuChen

doi:10.1080/10798587.2013.869119

摘要

Based on the adaptive reward-shaping mechanism, we propose a novel gradient descent (GD) Sarsa(lambda) algorithm to solve the problems of ill initial performance and low convergence speed in the reinforcement learning tasks with continuous state space. Adaptive normalized radial basis ANRBF) network is used to shape reward. The reward-shaping mechanism propagates model knowledge to the learner in the form of the additional reward signal so that the initial performance and convergence speed can be improved effectively. A function approximation algorithm named ANRBF-GD-Sarsa(lambda) is proposed based on the ANRBF network. The convergence of ANRBF-GD-Sarsa(lambda) is analyzed theoretically. Experiments are conducted to show the good initial performance and high convergence speed of the proposed algorithm.

出版日期2013
单位苏州大学

全文

访问全文

收藏分享被引浏览

更新时间：2021-07-15 18:34

A GRADIENT DESCENT SARSA(lambda) ALGORITHM BASED ON THE ADAPTIVE REWARD-SHAPING MECHANISM

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友