An angle-based method for measuring the semantic similarity between visual and textual features

Tang, Chenwei; Lv, Jiancheng<sup>*</sup>; Chen, Yao; Guo, Jixiang

doi:10.1007/s00500-018-3051-y

摘要

The main challenge for most image-text tasks, such as zero-shot, is the way to measure the semantic similarity between visual and textual feature vectors. The common solution is to map the image feature vectors and text feature vectors into the Hilbert space and then rank the similarity by the inner product between feature vectors. In this paper, we learn the feature representation of images and their sentence descriptions by different deep neural networks to learn about the inner-modal correspondences between visual and language data. We then use a joint embedding structure based on angle calculation for measuring the semantic similarity between visual and textual features. In the proposed method, a constant factor b keeps the similarities of positive samples and negative samples at a certain distance. Since the proposed cosine similarity method involves both normalization and vectors computation, we also develop the learning algorithm on neural networks for expressing the semantic features of texts and images. We applied the angle-based method to the challenging Caltech-UCSD Birds and the Oxford-102 Flowers datasets. The experiments demonstrate good performances on both recognition and retrieval tasks.

出版日期2019-6
单位四川大学

全文

访问全文

收藏分享被引(6) 浏览

更新时间：2024-04-23 23:00

An angle-based method for measuring the semantic similarity between visual and textual features

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友