A MAPREDUCE BASED DISTRIBUTED LSI FOR SCALABLE INFORMATION RETRIEVAL

Liu Yang<sup>*</sup>; Li Maozhen; Khan Mukhtaj; Qi Man

摘要

Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the computing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a Map Reduce based distributed LSI algorithm for scalable information retrieval. The performance of MR-LSI is first evaluated in a small scale experimental cluster environment, and subsequently evaluated in large scale simulation environments. By partitioning the dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computing nodes, the overhead of the MR-LSI algorithm is reduced significantly while maintaining a high level of accuracy in retrieving documents of user interest. A genetic algorithm based load balancing scheme is designed to optimize the performance of MR-LSI in heterogeneous computing environments in which the computing nodes have varied resources.

出版日期2014
单位四川大学; 同济大学

收藏分享被引浏览

更新时间：2018-08-03 12:17

A MAPREDUCE BASED DISTRIBUTED LSI FOR SCALABLE INFORMATION RETRIEVAL

摘要

产品服务

站内浏览

服务支持

联系方式

科研之友