Multiagent Reinforcement Learning with Regret Matching for Robot Soccer

Liu, Qiang<sup>*</sup>; Ma, Jiachen; Xie, Wei

doi:10.1155/2013/926267

摘要

This paper proposes a novel multiagent reinforcement learning (MARL) algorithm Nash-Q learning with regret matching, in which regret matching is used to speed up the well-known MARL algorithm Nash-Q learning. It is critical that choosing a suitable strategy for action selection to harmonize the relation between exploration and exploitation to enhance the ability of online learning for Nash-Q learning. In Markov Game the joint action of agents adopting regret matching algorithm can converge to a group of points of no-regret that can be viewed as coarse correlated equilibrium which includes Nash equilibrium in essence. It is can be inferred that regret matching can guide exploration of the state-action space so that the rate of convergence of Nash-Q learning algorithm can be increased. Simulation results on robot soccer validate that compared to original Nash-Q learning algorithm, the use of regret matching during the learning phase of Nash-Q learning has excellent ability of online learning and results in significant performance in terms of scores, average reward and policy convergence.

出版日期2013
单位哈尔滨工业大学

全文

访问全文

收藏分享被引浏览

更新时间：2021-07-17 16:09

Multiagent Reinforcement Learning with Regret Matching for Robot Soccer

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友