An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time

Fairbank Michael<sup>*</sup>; Alonso Eduardo; Prokhorov Danil

doi:10.1109/TNNLS.2013.2271778

摘要

We consider the adaptive dynamic programming technique called Dual Heuristic Programming (DHP), which is designed to learn a critic function, when using learned model functions of the environment. DHP is designed for optimizing control problems in large and continuous state spaces. We extend DHP into a new algorithm that we call Value-Gradient Learning, VGL (lambda), and prove equivalence of an instance of the new algorithm to Backpropagation Through Time for Control with a greedy policy. Not only does this equivalence provide a link between these two different approaches, but it also enables our variant of DHP to have guaranteed convergence, under certain smoothness conditions and a greedy policy, when using a general smooth nonlinear function approximator for the critic. We consider several experimental scenarios including some that prove divergence of DHP under a greedy policy, which contrasts against our proven-convergent algorithm.

出版日期2013-12

全文

访问全文

收藏分享被引(35) 浏览

更新时间：2024-04-11 20:56

An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友