A distributed framework for parallel data mining using HPJava']Java

被引:0
作者
Rana, OF [1 ]
Fisk, D [1 ]
机构
[1] BT Labs, Ipswich IP5 3RE, Suffolk, England
关键词
D O I
10.1023/A:1009696924527
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Java has become a language of choice for applications executing in heterogeneous environments utilising distributed objects and multithreading. To handle large data sets, scalable and efficient implementations of data mining approaches are required, generally employing computationally intensive algorithms. Conventional Java implementations do not directly provide support for the data structures often encountered in such algorithms, and they also lack repeatability in numerical precision across platforms. This paper describes a distributed framework employing task and data parallelism and implemented in high performance Java (HPJava). Issues of interest for data mining algorithms are identified, and possible solutions discussed for overcoming limitations in the Java Virtual Machine. The framework supports parallelism across workstation clusters, using the message passing interface as middleware, and can support different analysis algorithms, wrapped as Java objects, and linked to various databases using the Java database connectivity interface. Guidelines are provided for implementing parallel and distributed data mining on large data sets, and a proof-of-concept data mining application is analysed using a neural network.
引用
收藏
页码:146 / 154
页数:9
相关论文
共 19 条
[1]  
AGRAWAL R, 1996, P 2 INT C KNOW DISC
[2]  
ALBRECHT J, 1998, IDEAS P
[3]  
Bradley P. S., 1998, INFORMS J COMPUTING
[4]  
CARPENTER B, 1997, LANGUAGE BINDINGS DA
[5]  
Choi J., 1996, SCI PROGRAMMING-NETH, V5, P173
[6]  
CRAVEN M, 1997, USING NEURAL NETWORK
[7]   COMMUNICATION OPTIMIZATIONS FOR IRREGULAR SCIENTIFIC COMPUTATIONS ON DISTRIBUTED-MEMORY ARCHITECTURES [J].
DAS, R ;
UYSAL, M ;
SALTZ, J ;
HWANG, YS .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 22 (03) :462-478
[8]  
FLOHR U, 1997, BYTE SEP
[9]  
Goldberg D., 1989, GENETIC ALGORITHMS S
[10]  
Gropp W., 1994, USING MPI