摘要

In this letter, we cast visual tracking as a template matching problem in a Siamese deep convolutional neural network architecture. In contrast to traditional or other deep feature-based tracking methods, the proposed model exploits multilevel convolutional features from a partial view. The model matches candidate patch and template patch from the feature dimension of convolutional features, leading to hundreds of thousands of base matchers. The base matchers from low-level convolutional features have small receptive fields which contain partial details of targets while the base matchers from high-level convolutional features have big receptive fields which capture semantic information of targets. The model achieves the final strong matcher as a weighted ensemble of all the base matchers. We design an effective weights propagation strategy to update the weights of base matchers. Moreover, we propose to useCosine as the distance metric and a customized squaredloss function as cost function for robust. Experiments showthat our tracker outperforms the state-of-the-art trackers in a wide range of tracking scenarios.