摘要

In this paper, an automatic image-text alignment algorithm is developed to achieve more effective indexing and retrieval of large-scale web images by aligning web images with their most relevant auxiliary text terms or phrases. First, a large number of cross-media web pages (which contain web images and their auxiliary texts) are crawled and segmented into a set of image-text pairs (informative web images and their associated text terms or phrases). Second, near-duplicate image clustering is used to group large-scale web images into a set of clusters of near-duplicate images according to their visual similarities. The near-duplicate web images in the same cluster share similar semantics and are simultaneously associated with a same or similar set of auxiliary text terms or phrases which co-occur frequently in the relevant text blocks, thus performing near-duplicate image clustering can significantly reduce the uncertainty on the relatedness between the semantics of web images and their auxiliary text terms or phrases. Finally, random walk is performed over a phrase correlation network to achieve more precise image-text alignment by refining the relevance scores between the web images and their auxiliary text terms or phrases. Our experiments on algorithm evaluation have achieved very positive results on large-scale cross-media web pages.