An off-policy least square algorithms with eligibility trace based on importance reweighting

Zhang, Haifei<sup>*</sup>; Hong, Ying; Qiu, Jianlin

doi:10.1007/s10586-017-1165-0

摘要

In reinforcement learning problems with large-scale and continuous state or action spaces, the approximate reinforcement learning methods are proposed by using function approximation methods to fit the policy. The least-square approximation can extract more useful information from the samples and can be applied to the online algorithms effectively. Because of the complexity of the reinforcement learning problems, the samples generate from target policy cannot be used to evaluate target policy, so off-policy methods have to be used. Eligibility is usually able to accelerate the convergence of the algorithm. This paper proposed off-policy least square algorithms with eligibility trace based on importance reweighting: OFP-LSPE-Q, OFP-LSTD-Q. From the derivation, the algorithm indicates that the convergence rate of the algorithm will be faster with the increase of sample size in the case of off-policy, compared with the traditional least square reinforcement learning method.

出版日期2017-12
单位南通大学; 河海大学

全文

访问全文

收藏分享被引浏览

更新时间：2021-07-15 08:10

An off-policy least square algorithms with eligibility trace based on importance reweighting

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友