Learning ELM-Tree from big data based on uncertainty reduction

被引:42
作者
Wang, Ran [1 ]
He, Yu-Lin [2 ]
Chow, Chi-Yin [1 ]
Ou, Fang-Fang [2 ]
Zhang, Jian [2 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[2] Hebei Univ, Coll Math & Comp Sci, Key Lab Machine Learning & Computat Intelligence, Baoding 071002, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Big data classification; Decision tree; ELM-Tree; Extreme learning machine; Uncertainty reduction; PARALLEL; MACHINE; CLASSIFIERS; ATTRIBUTES; REGRESSION; INDUCTION;
D O I
10.1016/j.fss.2014.04.028
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A challenge in big data classification is the design of highly parallelized learning algorithms. One solution to this problem is applying parallel computation to different components of a learning model. In this paper, we first propose an extreme learning machine tree (ELM-Tree) model based on the heuristics of uncertainty reduction. In the ELM-Tree model, information entropy and ambiguity are used as the uncertainty measures for splitting decision tree (DT) nodes. Besides, in order to resolve the over-partitioning problem in the DT induction, ELMs are embedded as the leaf nodes when the gain ratios of all the available splits are smaller than a given threshold. Then, we apply parallel computation to five components of the ELM-Tree model, which effectively reduces the computational time for big data classification. Experimental studies demonstrate the effectiveness of the proposed method. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:79 / 100
页数:22
相关论文
共 35 条
[21]   PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce [J].
Panda, Biswanath ;
Herbach, Joshua S. ;
Basu, Sugato ;
Bayardo, Roberto J. .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02) :1426-1437
[22]  
Quinlan J. R., 1986, Machine Learning, V1, P81, DOI 10.1007/BF00116251
[23]   Improved use of continuous attributes in C4.5 [J].
Quinlan, JR .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :77-90
[24]  
Shafer J, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P544
[25]   A parallel decision tree-based method for user authentication-based on keystroke patterns [J].
Sheng, Y ;
Phoha, VV ;
Rovnyak, SM .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2005, 35 (04) :826-833
[26]   Parallel formulations of decision-tree classification algorithms [J].
Srivastava, A ;
Han, EH ;
Kumar, V ;
Singh, V .
DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 3 (03) :237-261
[27]  
Sumner M, 2005, LECT NOTES ARTIF INT, V3721, P675
[28]   Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning [J].
Wang, Tao ;
Qin, Zhenxing ;
Jin, Zhi ;
Zhang, Shichao .
JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (07) :1137-1147
[29]   Induction of multiple fuzzy decision trees based on rough set technique [J].
Wang, Xi-Zhao ;
Zhai, Jun-Hai ;
Lu, Shu-Xia .
INFORMATION SCIENCES, 2008, 178 (16) :3188-3202
[30]   Non-Naive Bayesian Classifiers for Classification Problems With Continuous Attributes [J].
Wang, Xi-Zhao ;
He, Yu-Lin ;
Wang, Debby D. .
IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (01) :21-39