摘要

Automatic image annotation methods based on searching for correlations require a quality training image dataset. For a target image, its annotation is predicted based on a mutual similarity of the target image to the training images. One of the main problems of current methods is their low effectiveness and scalability if a relatively large-scale training dataset is used. In this paper we describe our approach "Automatic image aNNOtation Retriever" (ANNOR) for acquiring annotations for target images, which is based on a combination of local and global features. ANNOR is resistant to common transforms (cropping, scaling), which traditional approaches based on global features cannot cope with. We are able to ensure the robustness and generalization needed by complex queries and significantly eliminate irrelevant results. We identify objects directly in the target images and for each obtained annotation we estimate the probability of its relevance. We focus on the way how people manually annotate images (human aspects of image perception). We have designed ANNOR to use large-scale image training datasets. We present experimental results for three challenging (baseline) datasets. ANNOR makes an improvement as compared to the current state-of-the-art.

  • 出版日期2015-4