MapReduce Distributed Highly Random Fuzzy Forest for Noisy Big Data

被引:0
作者
Mustafic, Faruk [1 ]
Xiong, Ning [1 ]
Herera, Francisco [2 ]
Gallego, Sergio Ramrez [2 ]
机构
[1] Malardalen Univ, Hgsk Pl 1, S-72123 Vasteras, Sweden
[2] Univ Granada, Ave Hosp,S-N, Granada 18010, Spain
来源
2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD) | 2017年
基金
瑞典研究理事会;
关键词
random forest; fuzzy decision tree; highly random fuzzy forest; noisy Big Data; attribute noise; CLASSIFICATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays the amounts of data available to us have the ever larger growth trend. On the other hand such data often contain noise. We call them noisy Big Data. There is an increasing need for learning methods that can handle such noisy Big Data for classification tasks. In this paper we propose a highly random fuzzy forest algorithm for learning an ensemble of fuzzy decision trees from a big data set contaminated with attribute noise. We also present the distributed version of the proposed learning algorithm implemented in the MapReduce framework. Experiment results have demonstrated that the proposed algorithm is faster and more accurate than the state-of-the-art approach particularly in the presence of attribute noise.
引用
收藏
页码:560 / 567
页数:8
相关论文
共 23 条
[1]  
Bache K., 2013, UCI Machine Learning Repository
[2]  
Bechini A., 2016, 2016 IEEE INT C SYST
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Identifying mislabeled training data [J].
Brodley, CE ;
Friedl, MA .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1999, 11 :131-167
[5]  
De Matteis A.D., 2015, Fuzzy Systems (FUZZ-IEEE), 2015 IEEE International Conference on, P1
[6]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[7]   On the use of MapReduce for imbalanced big data using Random Forest [J].
del Rio, Sara ;
Lopez, Victoria ;
Manuel Benitez, Jose ;
Herrera, Francisco .
INFORMATION SCIENCES, 2014, 285 :112-137
[8]   Gradualness, uncertainty and bipolarity: Making sense of fuzzy sets [J].
Dubois, Didier ;
Prade, Henri .
FUZZY SETS AND SYSTEMS, 2012, 192 :3-24
[9]   Understandable Big Data: A survey [J].
Emani, Cheikh Kacfah ;
Cullot, Nadine ;
Nicolle, Christophe .
COMPUTER SCIENCE REVIEW, 2015, 17 :70-81
[10]   Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks [J].
Fernandez, Alberto ;
del Rio, Sara ;
Lopez, Victoria ;
Bawakid, Abdullah ;
del Jesus, Maria J. ;
Benitez, Jose M. ;
Herrera, Francisco .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 4 (05) :380-409