摘要

Learning automaton (LA) plays an essential part in the theoretical system of reinforcement learning. Not only does LA have important theoretical value, but also has extensive practical value, including many applications in security field. As classic family of LA, estimator algorithms have improved the performance of algorithms to a novel level. However, because of the limitation of estimators, the estimated values are not reliable at the beginning of learning process, which will lead to wrong updates of actions'selection probability vector. Using current update strategy, the offset of wrong updates only relies on large number of extra iterations. Thus, there is still room for enhancement of convergence efficiency. In this paper, for improving the update strategy of probability vector, we proposed a double competitive scheme based learning automata algorithm and proved its Ε-optimality. The simulation results showed that the proposed algorithms reached the highest convergence efficiency and confirmed the validity of proposed probability vector's update strategy.

全文