Text classification based on optimization feature selection methods: a review and future directions

被引:0
作者
Alyasiri O.M. [1 ,2 ]
Cheah Y.-N. [1 ]
Zhang H. [1 ]
Al-Janabi O.M. [3 ]
Abasi A.K. [4 ]
机构
[1] School of Computer Sciences, Universiti Sains Malaysia, Penang
[2] Karbala Technical Institute, Al-Furat Al-Awsat Technical University, Karbala
[3] College of Medicine, University of Baghdad, Baghdad
[4] Machine Learning Department, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi
关键词
Feature selection; Machine learning classifiers; Optimization algorithms; Text categorization; Text classification; Text mining;
D O I
10.1007/s11042-024-19769-6
中图分类号
学科分类号
摘要
A substantial portion of today’s multimedia data exists in the form of unstructured text. However, the unstructured nature of text poses a significant task in meeting users’ information requirements. Text classification (TC) has been extensively employed in text mining to facilitate multimedia data processing. However, accurately categorizing texts becomes challenging due to the increasing presence of non-informative features within the corpus. Several reviews on TC, encompassing various feature selection (FS) approaches to eliminate non-informative features, have been previously published. However, these reviews do not adequately cover the recently explored approaches to TC problem-solving utilizing FS, such as optimization techniques. This study comprehensively analyzes different FS approaches based on optimization algorithms for TC. We begin by introducing the primary phases involved in implementing TC. Subsequently, we explore a wide range of FS approaches for categorizing text documents and attempt to organize the existing works into four fundamental approaches: filter, wrapper, hybrid, and embedded. Furthermore, we review four optimization algorithms utilized in solving text FS problems: swarm intelligence-based, evolutionary-based, physics-based, and human behavior-related algorithms. We discuss the advantages and disadvantages of state-of-the-art studies that employ optimization algorithms for text FS methods. Additionally, we consider several aspects of each proposed method and thoroughly discuss the challenges associated with datasets, FS approaches, optimization algorithms, machine learning classifiers, and evaluation criteria employed to assess new and existing techniques. Finally, by identifying research gaps and proposing future directions, our review provides valuable guidance to researchers in developing and situating further studies within the current body of literature. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:14187 / 14233
页数:46
相关论文
共 153 条
  • [1] Abdulrauf Sharifai G., Zainol Z., Feature selection for high-dimensional and imbalanced biomedical data based on robust correlation based redundancy and binary grasshopper optimization algorithm, Genes, 11, 7, (2020)
  • [2] Abdulwahab H.M., Ajitha S., Saif M.A.N., Feature selection techniques in the context of big data: taxonomy and analysis, Appl Intell, 52, 12, pp. 13568-13613, (2022)
  • [3] Abiodun E.O., Alabdulatif A., Abiodun O.I., Et al., A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities, Neural Comput Applic, 33, 22, pp. 15091-15118, (2021)
  • [4] Adam S.P., Alexandropoulos S.A.N., Pardalos P.M., Et al., No free lunch theorem: A review, pp. 57-82, (2019)
  • [5] Afrin S., Shamrat F.J.M., Nibir T.I., Et al., Supervised machine learning based liver disease prediction approach with lasso feature selection, Bull Electric Eng Inf, 10, 6, pp. 3369-3376, (2021)
  • [6] Aggarwal A., Singh J., Gupta D.K., A review of different text categorization techniques, Int J Eng Technol, 7, 3, pp. 11-15, (2018)
  • [7] Agrawal P., Abutarboush H.F., Ganesh T., Et al., Metaheuristic algorithms on feature selection: A survey of one decade of research (2009–2019), IEEE Access, 9, pp. 26766-26791, (2021)
  • [8] Akinola O.O., Ezugwu A.E., Agushaka J.O., Et al., Multiclass feature selection with metaheuristic optimization algorithms: a review, Neural Comput Applic, 34, 22, pp. 19751-19790, (2022)
  • [9] Aktas M., Kilic F., Binary grey wolf optimizer using archeology and astronomy news for text classification, In: Proc. Int. Conf. Innov. Eng. Appl.(Ciea), pp. 1-7, (2021)
  • [10] Alhaj Y.A., Dahou A., Al-qaness M.A., Et al., A novel text classification technique using improved particle swarm optimization: A case study of arabic language, Fut Internet, 14, 7, (2022)