摘要

Reinforcement learning where decision-making agents learn optimal policies through environmental interactions is an attractive paradigm for model-free, adaptive controller design. However, results for systems with continuous state and action variables are rare. In this paper, we present convergence results for optimal linear quadratic control of discrete-time linear stochastic systems. This work can be viewed as a generalization of a previous work on deterministic linear systems. Key differences between the algorithms for deterministic and stochastic systems are highlighted through examples.

  • 出版日期2010-8