摘要

This paper investigates properties of the optimality equation and optimal policies in discrete time Markov decision processes with expected discounted total rewards under weak conditions that the model is well defined and the optimality equation is true. The optimal value function is characterized as a solution of the optimality equation and the structure of optimal policies is also given.