Simultaneous feature and instance selection in big noisy data using memetic variable neighborhood search

被引:9
作者
Lin, Chun-Cheng [1 ,2 ,3 ]
Kang, Jia-Rong [4 ]
Liang, Yu-Lin [1 ]
Kuo, Chih-Chi [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Dept Ind Engn & Management, Hsinchu 300, Taiwan
[2] Asia Univ, Dept Business Adm, Taichung 413, Taiwan
[3] China Med Univ, China Med Univ Hosp, Dept Med Res, Taichung 404, Taiwan
[4] Tatung Univ, Dept Informat Management, Taipei 104, Taiwan
关键词
Big data analysis; Noisy data; Feature selection; Instance selection; Metaheuristic; DIFFERENTIAL EVOLUTION; GENETIC ALGORITHMS; HARMONY SEARCH; OPTIMIZATION; HYBRID; CLASSIFICATION;
D O I
10.1016/j.asoc.2021.107855
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In smart factories, the data collected by Internet-of-things sensors is enormous and includes a lot of noise and missing values. To address this big data problem, metaheuristic is one of the main approaches to data preprocessing, i.e., instance selection or feature selection before training the model. Most previous works on metaheuristic approaches rarely considered simultaneous instance selection and feature selection, and rarely focused on addressing big noisy data. Consequently, this work proposes a hybrid memetic algorithm (MA) with variable neighborhood search (VNS) to simultaneously select instances and features, in which MA performs excellently in data selection; and VNS has been shown to perform well in local search. To evaluate the performance of the proposed algorithm, this work creates simulation data by combining the datasets from the UCI with noisy data. The proposed algorithm for simultaneous feature and instance selection is adopted to reduce the simulation data, and then the reduced data is adopted to train a predictive model for later performance evaluation of model testing. As compared with other metaheuristics, the proposed algorithm achieves a balance between exploration and exploitation. Additionally, the results show that the proposed algorithm is more robust than other feature selection methods. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Quantum based Whale Optimization Algorithm for wrapper feature selection
    Agrawal, R. K.
    Kaur, Baljeet
    Sharma, Surbhi
    [J]. APPLIED SOFT COMPUTING, 2020, 89
  • [2] Ahmad SSS, 2011, IEEE SYS MAN CYBERN, P2127, DOI 10.1109/ICSMC.2011.6083986
  • [3] Hybrid of Harmony Search Algorithm and Ring Theory-Based Evolutionary Algorithm for Feature Selection
    Ahmed, Shameem
    Ghosh, Kushal Kanti
    Singh, Pawan Kumar
    Geem, Zong Woo
    Sarkar, Ram
    [J]. IEEE ACCESS, 2020, 8 : 102629 - 102645
  • [4] Albuquerque IMR, 2020, 2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), P616, DOI 10.1109/SSCI47803.2020.9308307
  • [5] Genetic Training Instance Selection in Multiobjective Evolutionary Fuzzy Systems: A Coevolutionary Approach
    Antonelli, Michela
    Ducange, Pietro
    Marcelloni, Francesco
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2012, 20 (02) : 276 - 290
  • [6] Advances in instance selection for instance-based learning algorithms
    Brighton, H
    Mellish, C
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2002, 6 (02) : 153 - 172
  • [7] Automatic feature group combination selection method based on GA for the functional regions clustering in DBS
    Cao, Lei
    Li, Jie
    Zhou, Yuanyuan
    Liu, Yunhui
    Liu, Hao
    [J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2020, 183
  • [8] Efficient ant colony optimization for image feature selection
    Chen, Bolun
    Chen, Ling
    Chen, Yixin
    [J]. SIGNAL PROCESSING, 2013, 93 (06) : 1566 - 1576
  • [9] Evolutionary feature and instance selection for traffic sign recognition
    Chen, Zong-Yao
    Lin, Wei-Chao
    Ke, Shih-Wen
    Tsai, Chih-Fong
    [J]. COMPUTERS IN INDUSTRY, 2015, 74 : 201 - 211
  • [10] Das H., 2020, APPL INTELLIGENT DEC, P213