摘要

In this paper, we present a rapid learning algorithm called Dyna-QPC. The proposed algorithm requires considerably less training time than Q-learning and Table-based Dyna-Q algorithm, making it applicable to real-world control tasks. The Dyna-QPC algorithm is a combination of existing learning techniques: CMAC, Q-learning, and prioritized sweeping. In a practical experiment, the Dyna-QPC algorithm is implemented with the goal of minimizing the learning time required for a robot to navigate a discrete state-space containing obstacles. The robot learning agent uses Q-learning for policy learning and a CMAC-Model as an approximator of the system environment. The prioritized sweeping technique is used to manage a queue of previously influential state-action pairs used in a planning function. The planning function is implemented as a background task updating the learning policy based on previous experience stored by the approximation model. As background tasks run during CPU idle time, there is no additional loading on the system processor. The Dyna-QPC agent switches seamlessly between real and virtual modes with the objective of achieving rapid policy learning. A simulated and an experimental scenario have been designed and implemented. The simulated scenario is used to test the speed and efficiency of the three learning algorithms, while the experimental scenario evaluates the new Dyna-QPC agent. Results from both simulated and experimental scenarios demonstrate the superior performance of the proposed learning agent.

  • 出版日期2014-11