A hybrid multi-objective firefly and simulated annealing based algorithm for big data classification

被引:15
作者
Devi, S. Gayathri [1 ]
Sabrigiriraj, M. [2 ]
机构
[1] Coimbatore Inst Engn & Technol, Dept Informat Technol, Coimbatore 641109, Tamil Nadu, India
[2] SVS Coll Engn, Dept Elect & Commun Engn, Coimbatore, Tamil Nadu, India
关键词
big data; big data classification; Hybrid Multi-Objective Firefly and Simulated Annealing (HMOFSA) algorithm; Kernel Support Vector Machine (KSVM); MapReduce(MR) paradigm; meta-heuristic algorithm; Online Feature Selection (OFS); FEATURE-SELECTION; MAPREDUCE;
D O I
10.1002/cpe.4985
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Efficient management of big data becomes challenging in recent decades. Online Feature Selection (OFS) is one type of online learning in contrast to batch learning, allowing a classifier to have small and fixed number of features. The major aim of this work is to introduce an OFS algorithm supported on meta-heuristic algorithm that exploits the MapReduce paradigm. A novel Hybrid Multi-Objective Firefly and Simulated Annealing (HMOFSA) algorithm is proposed to select optimal set of features. Therefore, as a first step, the original big dataset is decomposed into blocks of examples in the map phase. Subsequently, HMOFSA algorithm is employed to choose the selected features from examples. After that, the attained partial outcomes will be combined into a final vector of features in the reduce phase and evaluated using Kernel Support Vector Machine (KSVM) classifier. The mentioned OFS approach is analyzed with the help of the well-known classifiers (Logistic Regression, KSVM and Naive Bayes) developed within the Spark framework. Experiments were conducted on big datasets, containing 66 million samples and 2000 attributes that confirm the proficiency of proposed work. The proposed KSVM classifier results are measured in terms of the metrics like Precision, Recall, Geometric-mean (G-mean), F-measure, and accuracy.
引用
收藏
页数:12
相关论文
共 41 条
[1]  
Aarts E., 2014, Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques, P265, DOI [DOI 10.1007/978-1-4614-6940-7_10, 10.1007/978-1-4614-6940-7_10]
[2]  
Aggarwal C.C., 2015, Data mining: The Textbook, DOI [10.1007/978-3-319-14142-8, DOI 10.1007/978-3-319-14142-8]
[3]  
Anusuya D., 2017, Int J Comput Res Dev, V1, P30, DOI DOI 10.5281/ZENODO.376807
[4]   A review of instance selection methods [J].
Arturo Olvera-Lopez, J. ;
Ariel Carrasco-Ochoa, J. ;
Francisco Martinez-Trinidad, J. ;
Kittler, Josef .
ARTIFICIAL INTELLIGENCE REVIEW, 2010, 34 (02) :133-143
[5]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[6]   Tracking the best hyperplane with a simple budget Perceptron [J].
Cavallanti, Giovanni ;
Cesa-Bianchi, Nicolo ;
Gentile, Claudio .
MACHINE LEARNING, 2007, 69 (2-3) :143-167
[7]  
Dash M., 1997, Intelligent Data Analysis, V1
[8]   Mapreduce: Simplified data processing on large clusters [J].
Dean, Jeffrey ;
Ghemawat, Sanjay .
COMMUNICATIONS OF THE ACM, 2008, 51 (01) :107-113
[9]   MapReduce: A Flexible Data Processing Tool [J].
Dean, Jeffrey ;
Ghemawat, Sanjay .
COMMUNICATIONS OF THE ACM, 2010, 53 (01) :72-77
[10]   The forgetron: a kernel-based perceptron on a budget [J].
Dekel, Ofer ;
Shalev-Shwartz, Shai ;
Singer, Yoram .
SIAM JOURNAL ON COMPUTING, 2008, 37 (05) :1342-1372