Adaptive Adversarial Multi-Armed Bandit Approach to Two-Person Zero-Sum Markov Games

Chang Hyeong Soo<sup>*</sup>; Hu Jiaqiao; Fu Michael C; Marcus Steven I

doi:10.1109/TAC.2009.2036333

登录

免费注册

赞收藏引用

科研之友

微信

新浪微博

Facebook

分享链接

Adaptive Adversarial Multi-Armed Bandit Approach to Two-Person Zero-Sum Markov Games

作者：Chang Hyeong Soo^*; Hu Jiaqiao; Fu Michael C; Marcus Steven I

来源：IEEE Transactions on Automatic Control, 2010, 55(2): 463-468.

DOI：10.1109/TAC.2009.2036333

摘要

This technical note presents a recursive sampling-based algorithm for finite horizon two-person zero-sum Markov games (MGs) based on the Exp3 algorithm developed by Auer et al. for adaptive adversarial multi-armed bandit problems. We provide a finite-iteration bound to the equilibrium value of the induced "sample average approximation game" of a given MG and prove asymptotic convergence to the equilibrium value of the given MG. The time and space complexities of the algorithm are independent of the state space of the game.

出版日期2010-2

全文

访问全文

收藏分享被引(2) 浏览

更新时间：2018-02-09 16:17

相似论文
引用论文
参考文献

产品服务

科研之友科研之友机构版科创云

站内浏览

科研成果科研人员科研机构

服务支持

帮助中心隐私政策服务条款

联系方式

在线客服：【立即咨询】客户热线：400-1616-289 电子邮箱：support@scholarmate.com

微信公众号