摘要

For reducing huge uncertainty on the relatedness between the web images and their auxiliary text terms, an automatic image-text alignment algorithm is developed to achieve more accurate indexing and retrieval of large-scale web images by assigning the web images into their most relevant visual text terms precisely. First, large-scale web pages are crawled, where the informative images and their most relevant auxiliary text blocks are extracted. Second, parallel image clustering is performed to partition large-scale informative web images into a large number of clusters. By grouping the visually-similar web images into the same cluster, our parallel image clustering algorithm can significantly reduce the huge uncertainty on the relatedness between the web images and their auxiliary text terms, which can provide a good starting point for supporting automatic image-text alignment. Finally, a relevance re-ranking algorithm is developed to identify the most relevant text terms for characterizing the semantics of the visually-similar web images in the same cluster, e.g., assigning the web images into their most relevant visual text terms. Our experiments on large-scale web images have obtained very positive results.

全文