摘要

Data variety has been one of the most critical features for multimedia big data. Some multimedia documents, although in different data formats and storage structures, often express similar semantic information. Therefore, the way to manage and retrieve multimedia documents reflecting users' intent in heterogeneous big data environments has become an important issue. In this paper, we present an effective and economical architecture named SHMR (Semantic-based Heterogeneous Multimedia Retrieval), which uses low cost to store and retrieve semantic information from heterogeneous multimedia data. Firstly, the particularity of heterogeneous multimedia retrieval in big data environments is addressed. Secondly, an approach to extract and represent semantic information for heterogeneous multimedia documents is proposed. Thirdly, a NoSQL-based approach to semantic storage, in which multimedia can be parallel processed in distributed nodes is provided. Finally, a MapReduce-based retrieval algorithm is presented and a user feedback supported scheme to achieve high retrieval precision and good user experience is designed. The experimental results indicate that the retrieval performance and economic efficiency of SHMR are suitable for multimedia information retrieval in heterogeneous big data environments.