Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing

被引:35
作者
Takizawa, Hiroyuki [1 ]
Kobayashi, Hiroaki
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Aoba Ku, Sendai, Miyagi 9808578, Japan
[2] Tohoku Univ, Informat Synergy Ctr, Aoba Ku, Sendai, Miyagi 9808578, Japan
关键词
programmable graphics processing unit (GPU); general-purpose computation on GPU (GPGPU); k-means data clustering; PC cluster; the divide-and-conquer approach;
D O I
10.1007/s11227-006-8294-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents an effective scheme for clustering a huge data set using a PC cluster system, in which each PC is equipped with a commodity programmable graphics processing unit (GPU). The proposed scheme is devised to achieve three-level hierarchical parallel processing of massive data clustering. The divide-and-conquer approach to parallel data clustering is employed to perform the coarse-grain parallel processing by multiple PCs with a message passing mechanism. By taking advantage of the GPU's parallel processing capability, moreover, the proposed scheme can exploit two types of the fine-grain data parallelism at the different levels in the nearest neighbor search, which is the most computationally-intensive part of the data-clustering process. The performance of our scheme is discussed in comparison with that of the implementation entirely running on CPU. Experimental results clearly show that the proposed hierarchial parallel processing can remarkably accelerate the data clustering task. Especially, GPU co-processing is quite effective to improve the computational efficiency of parallel data clustering on a PC cluster. Although data-transfer from GPU to CPU is generally costly, acceleration by GPU co-processing is significant to save the total execution time of data-clustering.
引用
收藏
页码:219 / 234
页数:16
相关论文
共 26 条
[1]   Parallel codebook design for vector quantization on a message passing MIMD architecture [J].
Abbas, HM ;
Bayoumi, MM .
PARALLEL COMPUTING, 2002, 28 (7-8) :1079-1093
[2]  
Anderberg M.R., 1973, Probability and Mathematical Statistics
[3]  
[Anonymous], P ACM SIGGRAPH EUROG
[4]  
[Anonymous], 1999, OPENGL PROGRAMMING G
[5]  
BOHN CA, 1998, KOHONEN FEATURE MAPP
[6]  
Buck I., 2005, GPU GEMS, V2, P509
[7]  
Everitt BS., 2001, CLUSTER ANAL
[8]  
FAN Z, 2004, ACM IEEE SC2004 C
[9]  
Fayyad U, 1996, 2 INT C KNOWL DISC D
[10]  
FORGY EW, 1965, BIOMETRICS, V21, P768