Using Learning Classifier Systems to Learn Stochastic Decision Policies

Chen Gang<sup>*</sup>; Douch Colin I J; Zhang Mengjie

doi:10.1109/TEVC.2015.2415464

摘要

To solve reinforcement learning problems, many learning classifier systems (LCSs) are designed to learn state-action value functions through a compact set of maximally general and accurate rules. Most of these systems focus primarily on learning deterministic policies by using a greedy action selection strategy. However, in practice, it may be more flexible and desirable to learn stochastic policies, which can be considered as direct extensions of their deterministic counterparts. In this paper, we aim to achieve this goal by extending each rule with a new policy parameter. Meanwhile, a new method for adaptive learning of stochastic action selection strategies based on a policy gradient framework has also been introduced. Using this method, we have developed two new learning systems, one based on a regular gradient learning technology and the other based on a new natural gradient learning method. Both learning systems have been evaluated on three different types of reinforcement learning problems. The promising performance of the two systems clearly shows that LCSs provide a suitable platform for efficient and reliable learning of stochastic policies.

出版日期2015-12

全文

访问全文

收藏分享被引(6) 浏览

更新时间：2024-01-13 05:19

Using Learning Classifier Systems to Learn Stochastic Decision Policies

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友