Attentive Linear Transformation for Image Captioning

Ye, Senmao; Han, Junwei<sup>*</sup>; Liu, Nian

doi:10.1109/TIP.2018.2855406

摘要

We propose a novel attention framework called attentive linear transformation (ALT) for automatic generation of image captions. Instead of learning the spatial or channel-wise attention in existing models, ALT learns to attend to the high-dimensional transformation matrix from the image feature space to the context vector space. Thus ALT can learn various relevant feature abstractions, including spatial attention, channel-wise attention, and visual dependence. Besides, we propose a soft threshold regression to predict the spatial attention probabilities. It preserves more relevant local regions than popular softmax regression. Extensive experiments on the MS COCO and the Flickr30k data sets all demonstrate the superiority of our model compared with other state-of-the-art models.

出版日期2018-11
单位西北工业大学

全文

访问全文

收藏分享被引(49) 浏览

更新时间：2024-03-30 11:43

Attentive Linear Transformation for Image Captioning

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友