摘要

In this paper, we propose a novel multimodal retrieval model based on the Extreme Learning Machine (ELM). We exploit two multimedia modalities, the image and text, to achieve the multimodal retrieval. To begin with, we employ the probabilistic Latent Semantic Analysis (pLSA) to respectively simulate the generating processes of texts and images. So we obtain the appropriate representations of the images and those of the texts. Furthermore, ELM is used for training the correlation between the representations of the images and those of the texts. So the multimodal retrieval is implemented by the learned single-hidden layer feedforward neural networks (SLFNs). Additionally, the binary classifiers are trained to improve the accuracy of the multimodal retrieval model. This multimodal model can easily be extended into other modalities and extensive experimental results demonstrate the effectiveness and efficiency of this model based on ELM.