Distributed multivariate regression using wavelet-based collective data mining

被引:18
作者
Hershberger, DE [1 ]
Kargupta, H [1 ]
机构
[1] Washington State Univ, Sch Elect Engn & Comp Sci, Pullman, WA 99164 USA
关键词
data mining; distributed data mining; collective data mining; knowledge discovery; wavelets; regression;
D O I
10.1006/jpdc.2000.1694
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents a method for distributed multivariate regression using wavelet-based collective data mining (CDM). The method seamlessly blends machine learning and the theory of communication with the statistical methods employed in parametric multivariate regression to provide an effective data mining technique for use in a distributed data and computation environment. The technique is applied to two benchmark data sets, producing results that are consistent with those obtained by applying standard parametric regression techniques to centralized data sets. Evaluation of the method in terms of mode accuracy as a function of appropriateness of the selected wavelet function, relative number of nonlinear cross-terms. and sample size demonstrates that accurate parametric multivariate regression models call be generated from distributed, heterogeneous, data sets with minimal data communication overhead compared to that required to aggregate a distributed data set. Application of this method to linear discriminant analysis, which is related Co parametric multivariate regression, produced classification results on the Iris data set that are comparable to those obtained with centralized data analysis. (C) 2001 Academic Press.
引用
收藏
页码:372 / 400
页数:29
相关论文
共 44 条
[1]  
[Anonymous], UCI REPOSITORY MACHI
[2]  
[Anonymous], 1998, PROCEEDING 4 INT C K
[3]  
[Anonymous], 1988, Multivariate statistics: A practical approach
[4]  
ARONIS JM, 1996, ISL966 U PITTSB DEP
[5]  
Carmona R, 1998, PRACTICAL TIME FREQU, V9
[6]  
CARNAHAM B, 1969, APPL NUMERICAL METHO
[7]  
CHAN P, 1993, WORKING NOTES AAAI W, P227
[8]  
CHAN P, 1991, P 2 INT C INF KNOWL, P314
[9]  
CHATTRATICHAT J, 1998, P 4 INT C KNOWL DISC
[10]   Efficient mining of association rules in distributed databases [J].
Cheung, DW ;
Ng, VT ;
Fu, AW ;
Fu, YJ .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) :911-922