Advancing text classification: a novel two-stage multi-objective feature selection framework

被引:0
作者
Liu, Yan [1 ,2 ]
Cheng, Xian [3 ]
Stephen, Liao Shaoyi [2 ]
Wei, Shansen [3 ]
机构
[1] Univ Sci & Technol China, Sch Management, Hefei, Anhui, Peoples R China
[2] City Univ Hong Kong, Dept Informat Syst, Hong Kong, Peoples R China
[3] Sichuan Univ, Business Sch, 24 South Sect 1,Yihuan Rd, Chengdu 610065, Peoples R China
关键词
Text classification; Feature selection; Multi-objective decision making; Data envelopment analysis; SLACKS-BASED MEASURE; GENETIC ALGORITHM; NETWORK DEA; COMPONENT ANALYSIS; INFORMATION GAIN; MODEL; EFFICIENT; OPTIMIZATION; CLASSIFIERS; CATEGORY;
D O I
10.1007/s10799-025-00450-9
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
In the realm of text classification, feature selection stands as a pivotal element, focusing on the identification of relevant terms through filter indicators or accuracy measures. Given the plethora of available indicators and measures, the diverse information they unveil leads to disparate feature selection outcomes. This paper presents a novel two-stage multi-objective feature selection framework that encompasses multiple filter indicators and accuracy measures in both the filter and wrapper stages. Employing Data Envelopment Analysis (DEA), the framework addresses the multi-objective decision-making challenge by exploring the Pareto efficient frontier. To comprehensively assess the framework's efficacy, experiments were conducted on twelve datasets using six distinct Classification Algorithms. The results highlight the superiority of the DEA Filter-Wrapper model (DEAFW), constructed based on this innovative framework. DEAFW consistently outperformed five single-objective filter models and a one-stage multi-objective filter model across six performance metrics in the majority of cases. For instance, in the case of logistic regression, DEAFW achieved the highest average rank among twelve datasets across all performance metrics. Furthermore, a comparative analysis with four existing feature selection techniques affirmed the consistent superiority of the DEAFW model, as it consistently attained the smallest grand average rank value across twelve datasets for most performance metrics.
引用
收藏
页数:26
相关论文
共 109 条
[1]   On the integration of similarity measures with machine learning models to enhance text classification performance [J].
Abdalla, Hassan I. ;
Amer, Ali A. .
INFORMATION SCIENCES, 2022, 614 :263-288
[2]  
Abiodun EO, 2021, NEURAL COMPUT APPL, V33, P15091, DOI 10.1007/s00521-021-06406-8
[3]   Variable Global Feature Selection Scheme for automatic classification of text documents [J].
Agnihotri, Deepak ;
Verma, Kesari ;
Tripathi, Priyanka .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 81 :268-281
[4]   A PROCEDURE FOR RANKING EFFICIENT UNITS IN DATA ENVELOPMENT ANALYSIS [J].
ANDERSEN, P ;
PETERSEN, NC .
MANAGEMENT SCIENCE, 1993, 39 (10) :1261-1265
[6]   Cost-sensitive Feature Selection for Support Vector Machines [J].
Benitez-Pena, S. ;
Blanquero, R. ;
Carrizosa, E. ;
Ramirez-Cobo, P. .
COMPUTERS & OPERATIONS RESEARCH, 2019, 106 :169-178
[7]   Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking [J].
Bermejo, Pablo ;
de la Ossa, Luis ;
Gamez, Jose A. ;
Puerta, Jose M. .
KNOWLEDGE-BASED SYSTEMS, 2012, 25 (01) :35-44
[8]  
Bradley P. S., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P9
[9]   Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification [J].
Chantar, Hamouda ;
Mafarja, Majdi ;
Alsawalqah, Hamad ;
Heidari, Ali Asghar ;
Aljarah, Ibrahim ;
Faris, Hossam .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (16) :12201-12220
[10]   MEASURING EFFICIENCY OF DECISION-MAKING UNITS [J].
CHARNES, A ;
COOPER, WW ;
RHODES, E .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1978, 2 (06) :429-444