High-performance data mining with skeleton-based structured parallel programming

被引:26
作者
Coppola, M [1 ]
Vanneschi, M [1 ]
机构
[1] Univ Pisa, Dipartimento Informat, I-56125 Pisa, Italy
关键词
high performance computing; structured parallel programming; skeletons; data mining; association rules; clustering; classification;
D O I
10.1016/S0167-8191(02)00095-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We show how to apply a structured parallel programming (SPP) methodology based on skeletons to data mining (DM) problems, reporting several results about three commonly used mining techniques, namely association rules, decision tree induction and spatial clustering. We analyze the structural patterns common to these applications, looking at application performance and software engineering efficiency. Our aim is to clearly state what features a SPP environment should have to be useful for parallel DM. Within the skeleton-based PPE SHE that we have developed, we study the different patterns of data access of parallel implementations of Apriori, C4.5 and DBSCAN. We need to address large partitions reads, frequent and sparse access to small blocks, as well as an irregular mix of small and large transfers, to allow efficient development of applications on huge databases. We examine the addition of an object/component interface to the skeleton structured model, to simplify the development of environment-integrated, parallel DM applications. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:793 / 813
页数:21
相关论文
共 41 条
[1]   Parallel mining of association rules [J].
Agrawal, R ;
Shafer, JC .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) :962-969
[2]  
[Anonymous], 1993, C4 5 PROGRAMS MACH L
[3]  
ARLIA D, 2001, LNCS, V2150
[4]   SkIE: A heterogeneous environment for HPC applications [J].
Bacci, B ;
Danelutto, M ;
Pelagatti, S ;
Vanneschi, M .
PARALLEL COMPUTING, 1999, 25 (13-14) :1827-1852
[5]  
BAILEY S, 1955, HIGH PERFORMANCE IMP
[6]  
BECKMANN N, 1990, SIGMOD REC, V19, P322, DOI 10.1145/93605.98741
[7]  
Becuzzi P, 1999, LECT NOTES COMPUT SC, V1685, P1441
[8]  
BECUZZI P, PARALLELISATION C4 5, P382
[9]  
BERTCHOLD S, 1996, P 22 INT C VER LARG, P28
[10]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217