A Grey Synthesis Approach to Efficient Architecture Design for Temporal Difference Learning

Hwang Kao Shing<sup>*</sup>; Lo Chia Yue; Lee Guan Yuan

doi:10.1109/TMECH.2010.2082558

摘要

Temporal difference (TD) constitutes a class of methods for learning predictions in multistep prediction problems. The most important application of these methods is to temporal credit assignment in reinforcement learning. Although these TD procedures work in theory and in principle, its success is contingent on proper selection of parametric values. As well, its learning is majorly based on repeated exposures, which may not always be practical or feasible. This paper examines the issues of the efficient and general implementation of TD for hardware implementation of reinforcement learning algorithms by synthesizing the series of discounted sum of rewards along time. The proposed algorithm eliminates all step size parameters and improves data efficiency based on a synthetic approach of Grey theory. This paper also presents the stability of the proposed algorithm from the viewpoint of Grey theory. The algorithm along with a critic-actor reinforcement learning model is implemented in a System-on-a-Programmable-Chip (SOPC) board. In addition to comparing with the renowned model, adaptive heuristic critic (AHC), the results of experiments demonstrate that the proposed control mechanism can learn to control a system with very little a priori knowledge. Meanwhile, the effect of uncertainty in interactions between the system and the environment can be relaxed to some extent in the learning process of the proposed reinforcement learning agent.

出版日期2011-12

全文

访问全文

收藏分享被引(2) 浏览

更新时间：2019-05-18 03:50

A Grey Synthesis Approach to Efficient Architecture Design for Temporal Difference Learning

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友