A high-quality feature selection method based on frequent and correlated items for text classification

被引:66
作者
Farghaly, Heba Mamdouh [1 ]
Abd El-Hafeez, Tarek [1 ,2 ]
机构
[1] Minia Univ, Fac Sci, Dept Comp Sci, El Minia, Egypt
[2] Deraya Univ, Comp Sci Unit, El Minia, Egypt
关键词
Feature selection; Dimensionality reduction; Text classification; Association rule mining; Feature interaction;
D O I
10.1007/s00500-023-08587-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The feature selection problem is a significant challenge in pattern recognition, especially for classification tasks. The quality of the selected features plays a critical role in building effective models, and poor-quality data can make this process more difficult. This work explores the use of association analysis in data mining to select meaningful features, addressing the issue of duplicated information in the selected features. A novel feature selection technique for text classification is proposed, based on frequent and correlated items. This method considers both relevance and feature interactions, using association as a metric to evaluate the relationship between the target and features. The technique was tested using the SMS spam collecting dataset from the UCI machine learning repository and compared with well-known feature selection methods. The results showed that the proposed technique effectively reduced redundant information while achieving high accuracy (95.155%) using only 6% of the features.
引用
收藏
页码:11259 / 11274
页数:16
相关论文
共 40 条
[1]  
Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836
[2]  
Ahuja Ravinder, 2019, Procedia Computer Science, V152, P341, DOI 10.1016/j.procs.2019.05.008
[3]  
Anggraeny FT, 2018, PROSIDING INT C INFO, P113
[4]   Mutual information and sensitivity analysis for feature selection in customer targeting: A comparative study [J].
Barraza, Nestor ;
Moro, Sergio ;
Ferreyra, Marcelo ;
de la Pena, Adolfo .
JOURNAL OF INFORMATION SCIENCE, 2019, 45 (01) :53-67
[5]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[6]  
Dash M., 1997, Intelligent Data Analysis, V1
[7]  
Forman G, 2008, CH CRC DATA MIN KNOW, P257
[8]  
Gopal M., 2019, Applied machine learning
[9]  
ics.uci.edu, UCI machine learning repository
[10]   Feature selection: Evaluation, application, and small sample performance [J].
Jain, A ;
Zongker, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (02) :153-158