Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion

Jiang, Xiaofeng<sup>*</sup>; Wang, Xiaodong; Xi, Hongsheng; Liu, Falin

doi:10.1109/TAC.2017.2702203

摘要

In this paper, the decentralized partially observable Markov decision process (Dec-POMDP) systems with discrete state and action spaces are studied from a gradient point of view. Dec-POMDPs have recently emerged as a promising approach to optimizing multiagent decision making in the partially observable stochastic environment. However, the decentralized nature of the Dec-POMDP framework results in a lack of shared belief state, which makes the decision maker impossible to estimate the system state based on local information. In contrast to the belief-based policy, this paper focuses on optimizing the decentralized observation-based policy, which is easily to be applied and does not have the sharing problem. By analyzing the gradient of the objective function, we have developed a centralized stochastic gradient policy iteration algorithm to find the optimal policy on the basis of gradient estimates from a single sample path. This algorithm does not need any specific assumption and can be applied to most practical Dec-POMDP problems. One numerical example is provided to demonstrate the effectiveness of the algorithm.

出版日期2017-11
单位中国科学技术大学

全文

访问全文

收藏分享被引(3) 浏览

更新时间：2024-05-05 16:01

Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友