Efficient Probabilistic Latent Semantic Indexing using Graphics Processing Unit

被引:5
作者
Kouassi, Eli Koffi [1 ]
Amagasa, Toshiyuki [1 ]
Kitagawa, Hiroyuki [1 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki 3058573, Japan
来源
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS) | 2011年 / 4卷
关键词
Graphic Processing Unit (GPGPU); Clustering; Algorithms; Probabilistic Latent Semantic Indexint (PLSI); Expectation Maximization (EM) Algorithm; COMPUTATION; ALGORITHM;
D O I
10.1016/j.procs.2011.04.040
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we propose a scheme to accelerate the Probabilistic Latent Semantic Indexing (PLSI), which is an automated document indexing method based on a statistical latent semantic model, exploiting the high parallelism of Graphics Processing Unit (GPU). Our proposal is composed of three techniques: the first one is to accelerate the Expectation-Maximization (EM) computation by applying GPU matrix-vector multiplication; the second one uses the same principles as the first method, but deals with the sparseness of co-occurrence of words and documents; and the third one is to use the concurrent kernel execution, which is available on NVIDIA Fermi architecture, in order to speed up the process. We compare the performance of the proposed scheme with the non-parallelized implementation. The results show that our method could be more than 100 times faster than the CPU-based implementation in our environment. By dealing with the sparseness of the data, we could not only process more documents and words using GPU, but we could also keep more data on the device memory so that we can avoid massive data copy transfer between the host and the device susceptible to reduce the execution performance.
引用
收藏
页码:382 / 391
页数:10
相关论文
共 19 条
[1]  
[Anonymous], 1999, CLAIMING PLACE P 15
[2]  
Bakkum P, 2010, ACCELERATING SQL DAT, P94
[3]  
Bell Nathan, 2008, EFFICIENT SPARSE MAT
[4]  
BELLOCH GE, 1990, VECTOR MODELS DATA P
[5]  
Belloch Guy E., 1991, PREFIX SUMS THEIR AP
[6]  
Cavanagh J. M., 2009, GECC 2009, P2505
[7]  
CUBLAS, NVIDIA SDK LIN ALG
[8]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[9]  
2-9
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38