摘要

We present algorithms for finding optimal strategies for discounted, infinite-horizon, Determinsitc Markov Decision Processes (DMDPs). Our fastest algorithm has a worst-case running time of O(mn), improving the recent bound of O(mn(2)) obtained by Andersson and Vorbyov [ 2006]. We also present a randomized O(m(1/2)n(2))-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving an O(mn2)-time algorithm that can be obtained using ideas of Papadimitriou and Tsitsiklis [1987].

  • 出版日期2010-3