摘要

This paper proposes a pursuit-evasion algorithm based on the Option method from hierarchical reinforcement learning and applies it in multi-robot pursuit-evasion games in 2D-Dynamic environment. The algorithm efficiency is studied by comparing it with Q-learning. We decompose the complex task with Option method, and divide the learning process into two parts: High-level learning and Low-level learning, then design a new mechanism in order to make the learning process perform in parallel. The simulation result shows that the Option algorithm can efficiently reduce the complexity of pursuit-evasion task, avoid traditional reinforcement learning curse of dimensionality and improve the learning results.