Communication efficient construction of decision trees over heterogeneously distributed data

被引:14
作者
Giannella, C [1 ]
Liu, K [1 ]
Olsen, T [1 ]
Kargupta, H [1 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Comp Sci & Elect Engn, Baltimore, MD 21250 USA
来源
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2004年
关键词
decision trees; distributed data mining; random projection;
D O I
10.1109/ICDM.2004.10114
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an algorithm designed to efficiently construct a decision tree over heterogeneously distributed data without centralizing. We compare our algorithm against a standard centralized decision tree implementation in terms of accuracy as well as the communication complexity. Our experimental results show that by using only 20% of the communication cost necessary to centralize the data we can achieve trees with accuracy at least 80% of the trees produced by the centralized version.
引用
收藏
页码:67 / 74
页数:8
相关论文
共 12 条
[1]  
ARRIAGA RI, 1999, P 40 FDN COMP SCI NE
[2]  
CARAGEA D, 2003, P C INT SYST DES APP
[3]   An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization [J].
Dietterich, TG .
MACHINE LEARNING, 2000, 40 (02) :139-157
[4]  
Du W, 2002, PROC IEEE INT C PRIV, V14, P1
[5]  
HALL LO, 2000, LECT NOTES COMPUTER, V1759
[6]  
Hecht-Nielsen R., 1994, Computational Intelligence: Imitating Life, P43
[7]  
KARGUPTA H, 2004, 2004002 LLC, P1450
[8]  
KARGUPTA H, 2004, DATA MINING NEXT GEN
[9]  
Maclin R., 1999, J ARTIF INTELL RES, V11, P169, DOI [DOI 10.1613/JAIR.614, 10.1613/jair.614]
[10]  
PARK B, 2002, APPL INTELLIGEN 0116