摘要

We investigate Whittle index policy for a restless bandit model whose state space may be enlarged under passive actions.This model arises in many important applications and extends the classical model introduced by Whittle in 1988.In the classical model,one chooses a subset of arms to play at each time and accrue certain reward determined by the states of all arms and the subset of chosen arms.The state of each arm evolves according to a Markov process whose parameters(transition matrix) may depend on whether or not the arm is selected.The objective is to maximize the time-average reward over long-term.Weber and Weiss in 1990 proved the asymptotic optimality of Whittle index under a sufficient condition for the classical model,where Whittle’s indexability was required.In this paper,we extend Whittle index to the general model as considered here.Our extension is based on policy continuation and tie-breaking ordering of Whittle index when new states join the system.By requiring a positive recurrent sub-state-space and boundedness of immediate rewards,we show that randomization can achieve optimality under Whittle’s relaxed constraint.We further analyze the fluid dynamics of our model and show that the asymptotic optimality of Whittle index under the strict constraint can also be extended.

  • 出版日期2020