A distributed evolutionary based instance selection algorithm for big data using Apache Spark

被引:3
作者
Qin, Liyang [1 ]
Wang, Xiaoli [1 ]
Yin, Linzi [2 ]
Jiang, Zhaohui [1 ]
机构
[1] Cent South Univ, Sch Automat, Changsha 410083, Peoples R China
[2] Cent South Univ, Sch Elect Informat, Changsha 410083, Peoples R China
关键词
Evolutionary algorithm; Instance selection; Apache Spark; Big Data; DATA REDUCTION;
D O I
10.1016/j.asoc.2024.111638
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Instance selection is an important preprocessing technology in data mining and machine learning. In this paper, we proposed a novel evolutionary based instance selection algorithm for big data. First, we defined a coarse granularity chromosome structure to reduce the size of search space and costs of chromosome operations (recombination and mutation, etc.). Then a stratified evolution strategy was proposed to remove the hyper parameter in classic fitness function and achieve precise control over the reduction ratio of instances. Finally, a sampling-based fitness function was proposed to reduce the time complexity. Experimental results shown that our new algorithm is efficient to complete the instance selection task on data set with millions of instances in minutes-level. The 10-fold cross-validation also proved that the selection results on many datasets have high nearest neighbor classification accuracy.
引用
收藏
页数:16
相关论文
共 26 条
[1]   Genetic Training Instance Selection in Multiobjective Evolutionary Fuzzy Systems: A Coevolutionary Approach [J].
Antonelli, Michela ;
Ducange, Pietro ;
Marcelloni, Francesco .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2012, 20 (02) :276-290
[2]   A review of instance selection methods [J].
Arturo Olvera-Lopez, J. ;
Ariel Carrasco-Ochoa, J. ;
Francisco Martinez-Trinidad, J. ;
Kittler, Josef .
ARTIFICIAL INTELLIGENCE REVIEW, 2010, 34 (02) :133-143
[3]   A fast instance selection method for support vector machines in building extraction [J].
Aslani, Mohammad ;
Seipel, Stefan .
APPLIED SOFT COMPUTING, 2020, 97
[4]   Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability [J].
Cano, Jose Ramon ;
Herrera, Francisco ;
Lozano, Manuel .
DATA & KNOWLEDGE ENGINEERING, 2007, 60 (01) :90-108
[5]   Using evolutionary algorithms as instance selection for data reduction in KDD: An experimental study [J].
Cano, JR ;
Herrera, F ;
Lozano, M .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2003, 7 (06) :561-575
[6]   Big Data: A Survey [J].
Chen, Min ;
Mao, Shiwen ;
Liu, Yunhao .
MOBILE NETWORKS & APPLICATIONS, 2014, 19 (02) :171-209
[7]   Evolutionary feature and instance selection for traffic sign recognition [J].
Chen, Zong-Yao ;
Lin, Wei-Chao ;
Ke, Shih-Wen ;
Tsai, Chih-Fong .
COMPUTERS IN INDUSTRY, 2015, 74 :201-211
[8]   A multi-objective evolutionary algorithm based on length reduction for large-scale instance selection [J].
Cheng, Fan ;
Chu, Feixiang ;
Zhang, Lei .
INFORMATION SCIENCES, 2021, 576 :105-121
[9]  
Chou CH, 2006, INT C PATT RECOG, P556
[10]   Big data analytics for manufacturing internet of things: opportunities, challenges and enabling technologies [J].
Dai, Hong-Ning ;
Wang, Hao ;
Xu, Guangquan ;
Wan, Jiafu ;
Imran, Muhammad .
ENTERPRISE INFORMATION SYSTEMS, 2020, 14 (9-10) :1279-1303