Feature Selection and Classification of Big Data Using MapReduce Framework

被引:0
作者
Devi, D. Renuka [1 ]
Sasikala, S. [1 ]
机构
[1] Univ Madras, Dept Comp Sci, IDE, Chennai, Tamil Nadu, India
来源
INTELLIGENT COMPUTING, INFORMATION AND CONTROL SYSTEMS, ICICCS 2019 | 2020年 / 1039卷
关键词
Feature selection; Machine Learning; Bigdata; Parallel; MapReduce; ALGORITHM;
D O I
10.1007/978-3-030-30465-2_73
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Feature selection (FS) plays an imperative role in Machine Learning (ML) but it is really demanding when we apply feature selection to voluminous data. The conventional FS methods are not competent in handling big datasets. This leads to the need of a technology that processes the data in parallel. MapReduce is a new programming framework used for processing massive data by using the "divide and conquer" approach. In this paper, a novel parallel BAT algorithm is proposed for feature selection of big datasets and finally classification is applied to the set of known classifiers. The proposed parallel FS technique is highly scalable for big datasets. The experimental results have shown improved efficacy of the proposed algorithm in terms of the accuracy and comparatively lesser execution time when the number of parallel nodes is increased.
引用
收藏
页码:666 / 673
页数:8
相关论文
共 15 条
[1]  
Bista S, 2018, DDOS ATTACK DETECTIO
[2]   High dimensional data classification and feature selection using support vector machines [J].
Ghaddar, Bissan ;
Naoum-Sawaya, Joe .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2018, 265 (03) :993-1004
[3]  
Gill S.S., 2019, TAXON OPEN CHALL, P1
[4]  
Judy M.V., 2018, ANN CONVENTION COMPU
[5]   Cloud computing-based parallel genetic algorithm for gene selection in cancer classification [J].
Keco, Dino ;
Subasi, Abdulhamit ;
Kevric, Jasmin .
NEURAL COMPUTING & APPLICATIONS, 2018, 30 (05) :1601-1610
[6]   Distributed Whale Optimization Algorithm based on MapReduce [J].
Khalil, Yasser ;
Alshayeji, Mohammad ;
Ahmad, Imtiaz .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (01)
[7]   Feature Selection: A Data Perspective [J].
Li, Jundong ;
Cheng, Kewei ;
Wang, Suhang ;
Morstatter, Fred ;
Trevino, Robert P. ;
Tang, Jiliang ;
Liu, Huan .
ACM COMPUTING SURVEYS, 2018, 50 (06)
[8]   A method of SVM with Normalization in Intrusion Detection [J].
Li, Weijun ;
Liu, Zhenyu .
2011 2ND INTERNATIONAL CONFERENCE ON CHALLENGES IN ENVIRONMENTAL SCIENCE AND COMPUTER ENGINEERING (CESCE 2011), VOL 11, PT A, 2011, 11 :256-262
[9]   Distributed ReliefF-based feature selection in Spark [J].
Palma-Mendoza, Raul-Jose ;
Rodriguez, Daniel ;
de-Marcos, Luis .
KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 57 (01) :1-20
[10]  
Reggiani C., 2017, BEN C ART INT