摘要
In this note, we show that the evaluation phase in the policy iteration algorithm for the infinite horizon discounted Markov decision problem can be done in O(mN(2)) operations, where N is the number of states of the Markov decision process and m is the number of states in which the decision changes during the policy improvement phase.
- 出版日期1999-11
- 单位香港大学