A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities

被引:68
作者
Abiodun, Esther Omolara [1 ,3 ]
Alabdulatif, Abdulatif [2 ]
Abiodun, Oludare Isaac [1 ,3 ]
Alawida, Moatsum [1 ,4 ]
Alabdulatif, Abdullah [5 ]
Alkhawaldeh, Rami S. [6 ]
机构
[1] Univ Sains Malaysia, Sch Comp Sci, George Town, Malaysia
[2] Qassim Univ, Coll Comp, Dept Comp Sci, Buraydah, Saudi Arabia
[3] Univ Abuja, Dept Comp Sci, Abuja, Nigeria
[4] Abu Dhabi Univ, Dept Comp Sci, Abu Dhabi, U Arab Emirates
[5] Qassim Univ, Coll Sci & Arts, Comp Dept, POB 53, Al Rass, Saudi Arabia
[6] Univ Jordan, Dept Comp Informat Syst, Aqaba 77110, Jordan
关键词
Feature selection; Hyper-heuristics; Metaheuristic algorithm; Optimization; Text classification; PARTICLE SWARM OPTIMIZATION; PIGEON-INSPIRED OPTIMIZATION; ANT COLONY OPTIMIZATION; GREY WOLF OPTIMIZER; GENE SELECTION; DIFFERENTIAL EVOLUTION; ALGORITHM; SEARCH; REGRESSION; METAHEURISTICS;
D O I
10.1007/s00521-021-06406-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Specialized data preparation techniques, ranging from data cleaning, outlier detection, missing value imputation, feature selection (FS), amongst others, are procedures required to get the most out of data and, consequently, get the optimal performance of predictive models for classification tasks. FS is a vital and indispensable technique that enables the model to perform faster, eliminate noisy data, remove redundancy, reduce overfitting, improve precision and increase generalization on testing data. While conventional FS techniques have been leveraged for classification tasks in the past few decades, they fail to optimally reduce the high dimensionality of the feature space of texts, thus breeding inefficient predictive models. Emerging technologies such as the metaheuristics and hyper-heuristics optimization methods provide a new paradigm for FS due to their efficiency in improving the accuracy of classification, computational demands, storage, as well as functioning seamlessly in solving complex optimization problems with less time. However, little details are known on best practices for case-to-case usage of emerging FS methods. The literature continues to be engulfed with clear and unclear findings in leveraging effective methods, which, if not performed accurately, alters precision, real-world-use feasibility, and the predictive model's overall performance. This paper reviews the present state of FS with respect to metaheuristics and hyper-heuristic methods. Through a systematic literature review of over 200 articles, we set out the most recent findings and trends to enlighten analysts, practitioners and researchers in the field of data analytics seeking clarity in understanding and implementing effective FS optimization methods for improved text classification tasks.
引用
收藏
页码:15091 / 15118
页数:28
相关论文
共 157 条
[1]  
Abd-Alsabour N., 2016, Pattern Recognit. Anal. Appl., V10, P65253
[2]   A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection [J].
Abdel-Basset, Mohamed ;
El-Shahat, Doaa ;
El-henawy, Ibrahim ;
de Albuquerque, Victor Hugo C. ;
Mirjalili, Seyedali .
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 139
[3]   A Review on the Security of the Internet of Things: Challenges and Solutions [J].
Abiodun, Oludare Isaac ;
Abiodun, Esther Omolara ;
Alawida, Moatsum ;
Alkhawaldeh, Rami S. ;
Arshad, Humaira .
WIRELESS PERSONAL COMMUNICATIONS, 2021, 119 (03) :2603-2637
[4]   Text feature selection using ant colony optimization [J].
Aghdam, Mehdi Hosseinzadeh ;
Ghasem-Aghaee, Nasser ;
Basiri, Mohammad Ehsan .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :6843-6853
[5]   Solving knapsack problems using a binary gaining sharing knowledge-based optimization algorithm [J].
Agrawal, Prachi ;
Ganesh, Talari ;
Mohamed, Ali Wagdy .
COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (01) :43-63
[6]  
Ahmad SR, 2015, 2015 SCIENCE AND INFORMATION CONFERENCE (SAI), P222, DOI 10.1109/SAI.2015.7237148
[7]   A TRIZ-inspired bat algorithm for gene selection in cancer classification [J].
Al-Betar, Mohammed Azmi ;
Alomari, Osama Ahmad ;
Abu-Romman, Saeid M. .
GENOMICS, 2020, 112 (01) :114-126
[8]  
Al-Betar MA, 2018, ARAB J SCI ENG, V43, P7439, DOI 10.1007/s13369-018-3098-1
[9]  
Al-Zoubi AM, 2020, ALGO INTELL SY, P11, DOI 10.1007/978-981-32-9990-0_2
[10]   Iodine substituted phosphorus corrole complexes as possible photosensitizers in photodynamic therapy: Insights from theory [J].
Alberto, Marta Erminia ;
De Simone, Bruna Clara ;
Liuzzi, Simona ;
Marino, Tiziana ;
Russo, Nino ;
Toscano, Marirosa .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2020, 41 (14) :1395-1401