A MAPREDUCE BASED DISTRIBUTED LSI FOR SCALABLE INFORMATION RETRIEVAL

被引:0
作者
Liu, Yang [1 ]
Li, Maozhen [2 ,3 ]
Khan, Mukhtaj [2 ]
Qi, Man [4 ]
机构
[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu, Peoples R China
[2] Brunel Univ, Sch Engn & Design, Uxbridge UB8 3PH, Middx, England
[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai, Peoples R China
[4] Canterbury Christ Church Univ, Dept Comp, Canterbury CT1 1QU, Kent, England
关键词
Information retrieval; latent semantic indexing; Map Reduce; load balancing; genetic algorithms;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the computing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a Map Reduce based distributed LSI algorithm for scalable information retrieval. The performance of MR-LSI is first evaluated in a small scale experimental cluster environment, and subsequently evaluated in large scale simulation environments. By partitioning the dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computing nodes, the overhead of the MR-LSI algorithm is reduced significantly while maintaining a high level of accuracy in retrieving documents of user interest. A genetic algorithm based load balancing scheme is designed to optimize the performance of MR-LSI in heterogeneous computing environments in which the computing nodes have varied resources.
引用
收藏
页码:259 / 280
页数:22
相关论文
共 41 条
[1]  
Aarnio T., 2009, SEM INT
[2]  
Akl S. G., 1999, Parallel Processing Letters, V9, P499, DOI 10.1142/S0129626499000463
[3]  
[Anonymous], REV SCI COMPUTER PRO
[4]  
[Anonymous], 2009, Hadoop: The Definitive Guide
[5]  
[Anonymous], 1999, Modern Information Retrieval
[6]  
Aswani Kumar C., 2006, International Journal of Applied Mathematics and Computer Science, P551
[7]  
Bassu D., P TEXT MIN 2003 WORK
[8]   LARGE-SCALE SPARSE SINGULAR VALUE COMPUTATIONS [J].
BERRY, MW .
INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1992, 6 (01) :13-49
[9]  
Dean J., P OSDI 04 6 S OP SYS
[10]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO