Learning ELM-Tree from big data based on uncertainty reduction

被引：42

作者：

Wang, Ran ^{[1
]}

He, Yu-Lin ^{[2
]}

Chow, Chi-Yin ^{[1
]}

Ou, Fang-Fang ^{[2
]}

Zhang, Jian ^{[2
]}

机构：

[1] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China

[2] Hebei Univ, Coll Math & Comp Sci, Key Lab Machine Learning & Computat Intelligence, Baoding 071002, Hebei, Peoples R China

来源：

FUZZY SETS AND SYSTEMS | 2015年 / 258卷

基金：

中国国家自然科学基金;

关键词：

Big data classification; Decision tree; ELM-Tree; Extreme learning machine; Uncertainty reduction; PARALLEL; MACHINE; CLASSIFIERS; ATTRIBUTES; REGRESSION; INDUCTION;

D O I：

10.1016/j.fss.2014.04.028

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

A challenge in big data classification is the design of highly parallelized learning algorithms. One solution to this problem is applying parallel computation to different components of a learning model. In this paper, we first propose an extreme learning machine tree (ELM-Tree) model based on the heuristics of uncertainty reduction. In the ELM-Tree model, information entropy and ambiguity are used as the uncertainty measures for splitting decision tree (DT) nodes. Besides, in order to resolve the over-partitioning problem in the DT induction, ELMs are embedded as the leaf nodes when the gain ratios of all the available splits are smaller than a given threshold. Then, we apply parallel computation to five components of the ELM-Tree model, which effectively reduces the computational time for big data classification. Experimental studies demonstrate the effectiveness of the proposed method. (C) 2014 Elsevier B.V. All rights reserved.

引用

页码：79 / 100

页数：22

共 35 条

[21] PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce [J].

Panda, Biswanath ;

Herbach, Joshua S. ;

Basu, Sugato ;

Bayardo, Roberto J. .

PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02) :1426-1437

[22]

Quinlan J. R., 1986, Machine Learning, V1, P81, DOI 10.1007/BF00116251

[23] Improved use of continuous attributes in C4.5 [J].

Quinlan, JR .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :77-90

[24]

Shafer J, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P544

[25] A parallel decision tree-based method for user authentication-based on keystroke patterns [J].

Sheng, Y ;

Phoha, VV ;

Rovnyak, SM .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2005, 35 (04) :826-833

[26] Parallel formulations of decision-tree classification algorithms [J].

Srivastava, A ;

Han, EH ;

Kumar, V ;

Singh, V .

DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 3 (03) :237-261

[27]

Sumner M, 2005, LECT NOTES ARTIF INT, V3721, P675

[28] Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning [J].

Wang, Tao ;

Qin, Zhenxing ;

Jin, Zhi ;

Zhang, Shichao .

JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (07) :1137-1147

[29] Induction of multiple fuzzy decision trees based on rough set technique [J].

Wang, Xi-Zhao ;

Zhai, Jun-Hai ;

Lu, Shu-Xia .

INFORMATION SCIENCES, 2008, 178 (16) :3188-3202

[30] Non-Naive Bayesian Classifiers for Classification Problems With Continuous Attributes [J].

Wang, Xi-Zhao ;

He, Yu-Lin ;

Wang, Debby D. .

IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (01) :21-39

← 1 2 3 4 →