摘要

The proposed self-adaptive predictive pursuing policy consists of an action decision-making procedure and a procedure of adjusting the estimation of evader's action preference, Since correct estimation of opponent's intention would do good to win adversarial games, it introduces the conception of action preference to model opponent's decision-making. Because evader often has different action preference in different situation, to model evader's decision-making, pursuer has to divide the situation space into many categories and provide a set of estimation of evader's action preference for each kind of situation. Pursuer adjusts the estimation of evader's action preference in certain situation by observing evader's action. Action decision-making procedure consists of situation sorting, possible future states computation, payoff evaluation and action selection. Action decision-making is based on the decision tree constructed by expected payoffs. Expected payoffs are integrated from single payoffs. Single payoffs are evaluated by gains of features reflecting adversarial situation. A simulation of middle size soccer robots has been carried out and illustrated that the proposed policy is effective.