A Multi-part Convolutional Attention Network for Fine-Grained Image Recognition

作者:Zhong, Weilin*; Jiang, Linfeng; Zhang, Tao; Ji, Jinsheng; Xiong, Huilin
来源:24th International Conference on Pattern Recognition (ICPR), 2018-08-20 To 2018-08-24.
DOI:10.1109/icpr.2018.8545225

摘要

The goal of fine-grained image recognition is to recognize hundreds of sub-categories affiliating to the same basic-level category (e.g., bird species). It is a highly challenging task due to the large intra-class variance and small inter-class variance. Existing approaches deal with the subtle difference among object classes via learning and localizing discriminative parts. However, most of the part localization methods follow a step-to-step manner that first localizes larger parts and then generates smaller parts from the larger ones, which is not efficient. In this paper, we present a Multi-part Convolutional Attention Network (M-CAN), which simultaneously focuses on the discriminative image parts at multiple scales. In specific, a convolutional attention based part localization network is presented to localize multi-scale parts from different layers of the deep Convolutional Neural Networks (CNN). Importantly, our part localization network requires no part annotations but only the image labels, which avoids the heavy labor of complex part labeling. We conduct comprehensive experiments and the experimental results show that, our method outperforms the state-of-the-art approaches on three challenging fine-grained datasets, including CUB-Birds, Stanford-Dogs and Stanford-Cars.