摘要

Reinforcement Learning, also sometimes called learning by rewards and punishments is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment [1]. With repeated trials however, it is expected that the agent learns to perfect its behavior overtime. In this paper we simulate the reinforcement learning process of a mobile agent on a grid space and examine the situation in which multiple reinforcement learning agents can be used to speed up the learning process by sharing their Q-values. We propose a sharing method which takes into consideration the weight of the experience acquired by each agent on the occasion of visiting a state and taking an action.