摘要

This brief paper presents a policy-improvement method of generating a feasible stochastic policy (pi) over tilde from a given feasible stochastic base-policy pi such that (pi) over tilde improves all of the feasible policies %26quot;induced%26quot; from pi for infinite-horizon constrained discounted controlled Markov chains (CMCs). A policy-iteration heuristic for approximately solving constrained discounted CMCs is developed from this improvement method.

  • 出版日期2012-10