Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm

Thai Duong<sup>*</sup>; Duong Nguyen Huu; Thinh Nguyen

doi:10.1115/1.4032875

摘要

Markov decision process (MDP) is a well-known framework for devising the optimal decision-making strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a time-invariant transition probability matrix. However, in many real-world scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of time-variant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework.

出版日期2016-6

全文

访问全文

收藏分享被引浏览

更新时间：2021-03-26 03:24

Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友