摘要

Undiscounted Markov decision processes (UMDP's) can formulate optimal stochastic control problems that minimize the expected total cost per period for various systems. We propose new approximate dynamic programming (ADP) algorithms for large-scale UMDP's that can solve the curses of dimensionality. These algorithms, called simulation-based modified policy iteration (SBMPI) algorithms, are extensions of the simulation-based modified policy iteration method (SBMPIM) (Ohno, 2011) for optimal control problems of multistage JIT-based production and distribution systems with stochastic demand and production capacity. The main new concepts of the SBMPI algorithms are that the simulation-based policy evaluation step of the SBMPIM is replaced by the partial policy evaluation step of the modified policy iteration method (MPIM) and that the algorithms starts from the expected total cost per period and relative value estimated by simulating the system under a reasonable initial policy. For numerical comparisons, the optimal control problem of the three-stage JIT-based production and distribution system with stochastic demand and production capacity is formulated as a UMDP. The demand distribution is changed from a shifted binomial distribution in Ohno (2011) to a Poisson distribution and near-optimal policies of the optimal control problems with 35,973,840 states are computed by the SBMPI algorithms and the SBMPIM. The computational result shows that the SBMPI algorithms are at least 100 times faster than the SBMPIM in solving the numerical problems and are robust with respect to initial policies. Numerical examples are solved to show an effectiveness of the near optimal control utilizing the SBMPI algorithms compared with optimized pull systems with optimal parameters computed utilizing the SBOS (simulation-based optimal solutions) from Ohno (2011).

  • 出版日期2016-2-16