Performance improvement of data mining in Weka through multi-core and GPU acceleration: opportunities and pitfalls

被引:10
作者
Engel, Tiago Augusto [1 ]
Charao, Andrea Schwertner [1 ]
Kirsch-Pinheiro, Manuele [2 ]
Steffenel, Luiz-Angelo [3 ]
机构
[1] Univ Fed Santa Maria, Lab Sistemas Computacao, BR-97119900 Santa Maria, RS, Brazil
[2] Univ Paris 01, Ctr Rech Informat, F-75231 Paris 05, France
[3] Univ Reims, Equipe SysCom, Lab CReSTIC, Reims, France
关键词
TOOLKIT;
D O I
10.1007/s12652-015-0292-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data mining tools may be computationally demanding, which leads to an increasing interest on parallel computing strategies in order to improve their performance. While multi-core processors and Graphics Processing Units (GPUs) accelerators increased the computing power of current desktop computers, we observe that desktop-based data mining tools do not take full advantage of these architectures yet. This paper investigates strategies to improve the performance of Weka, a popular data mining tool, through multi-core and GPU acceleration. Using performance profiling of Weka, we identify operations that could improve the data mining performance when parallelized. We selected two of these operations, and analyze the impact of their parallel execution on Weka's performance. These experiments demonstrate that while significant speedups can be achieved, all operations are not prone to be parallelized, which reinforces the need for a careful and well-studied selection of the candidates.
引用
收藏
页码:377 / 390
页数:14
相关论文
共 46 条
[1]   Optimizing Operational and Strategic IT [J].
Andriole, Stephen J. ;
Bojanova, Irena .
IT PROFESSIONAL, 2014, 16 (05) :12-15
[2]  
[Anonymous], 1999, P ICONIP ANZIIS ANNE
[3]  
[Anonymous], 2010, ACM JEA, DOI [10.1145/1498698.1564500, DOI 10.1145/1498698.1564500]
[4]  
[Anonymous], 2008, TECHNICAL REPORT
[5]  
Bache K., 2013, UCI Machine Learning Repository
[6]   Comparison sorting on hybrid multicore architectures for fixed and variable length keys [J].
Banerjee, Dip Sankar ;
Sakurikar, Parikshit ;
Kothapalli, Kishore .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2014, 28 (03) :267-284
[7]  
Barry W, 2006, PARALLEL PROGRAMMING, P341
[8]  
Celis S., 2002, TECHNICAL REPORT
[9]  
CUDPP, 2014, CUDPP CUDA DATA PARA
[10]  
De Wael M., 2014, PPPJ '14, P39