摘要

With a long-run average performance as the primary criterion for a Markov decision process, variance measures are studied as its secondary criteria. The steady-state variance and the limiting average variance along a sample path are discussed. The latter one is difficult to handle due to its special form. With a sensitivity-based approach, the difference formula for the sample-path variance under different policies is intuitively constructed and then the optimality equation is presented. Moreover a policy iteration algorithm is developed. This work extends the sensitivity-based construction approach to Markov decision processes with non-standard performance criteria. The difference between these two types of variance and bias criteria is illustrated with a numerical example.