An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes

Bhatnagar Shalabh<sup>*</sup>

doi:10.1016/j.sysconle.2010.08.013

摘要

We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy.

出版日期2010-12

全文

访问全文

收藏分享被引(20) 浏览

更新时间：2024-01-20 09:08

An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友