A Deep Multi-Modal CNN for Multi-Instance Multi-Label Image Classification

Song, Lingyun<sup>*</sup>; Liu, Jun; Qian, Buyue; Sun, Mingxuan; Yang, Kuan; Sun, Meng; Abbas, Samar

doi:10.1109/TIP.2018.2864920

摘要

Deep convolutional neural networks (CNNs) have shown superior performance on the task of single-label image classification. However, the applicability of CNNs to multi-label images still remains an open problem, mainly because of two reasons. First, each image is usually treated as an inseparable entity and represented as one instance, which mixes the visual information corresponding to different labels. Second, the correlations amongst labels are often overlooked. To address these limitations, we propose a deep multi-modal CNN for multi-instance multi-label image classification, called MMCNN-MIML. By combining CNNs with multi-instance multi-label (MIML) learning, our model represents each image as a bag of instances for image classification and inherits the merits of both CNNs and MIML. In particular, MMCNN-MIML has three main appealing properties: 1) it can automatically generate instance representations for MIML by exploiting the architecture of CNNs; 2) it takes advantage of the label correlations by grouping labels in its later layers; and 3) it incorporates the textual context of label groups to generate multi-modal instances, which are effective in discriminating visually similar objects belonging to different groups. Empirical studies on several benchmark multi-label image data sets show that MMCNN-MIML significantly outperforms the state-of-the-art baselines on multi-label image classification tasks.

出版日期2018-12
单位西安交通大学

全文

访问全文

收藏分享被引(68) 浏览

更新时间：2024-04-19 06:44

A Deep Multi-Modal CNN for Multi-Instance Multi-Label Image Classification

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友