Feature selection based on feature interactions with application to text categorization

被引:63
作者
Tang, Xiaochuan [1 ,2 ]
Dai, Yuanshun [2 ]
Xiang, Yanping [2 ]
机构
[1] Chengdu Univ Technol, Sch Cyber Secur, Chengdu 610059, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; Feature interaction; Mutual information; Joint mutual information; Text categorization; MUTUAL INFORMATION; FRAMEWORK;
D O I
10.1016/j.eswa.2018.11.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is an import preprocessing approach for machine learning and text mining. It reduces the dimensions of high-dimensional data. A popular approach is based on information theoretic measures. Most of the existing methods used two- and three-dimensional mutual information terms that are ineffective in detecting higher-order feature interactions. To fill this gap, we employ two- through five-way interactions for feature selection. We first identify a relaxed assumption to decompose the mutual information-based feature selection problem into a sum of low-order interactions. A direct calculation of the decomposed interaction terms is computationally expensive. We employ five-dimensional joint mutual information, a computationally efficient measure, to estimate the interaction terms. We use the 'maximum of the minimum' nonlinear approach to avoid the overestimation of the feature significance. We also apply the proposed method to text categorization. To evaluate the performance of the proposed method, we compare it with eleven popular feature selection methods, eighteen benchmark data and seven text categorization data. Experimental results with four different types of classifiers provide concrete evidence that higher-order interactions are effective in improving feature selection methods. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:207 / 216
页数:10
相关论文
共 50 条
[41]   A non-redundant feature selection method for text categorization based on term co-occurrence frequency and mutual information [J].
Lazhar Farek ;
Amira Benaidja .
Multimedia Tools and Applications, 2024, 83 :20193-20214
[42]   Aggressive Dimensionality Reduction with Reinforcement Local Feature Selection for Text Categorization [J].
Zheng, Wenbin ;
Qian, Yuntao .
ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT I, 2010, 6319 :365-372
[43]   Multiple concept learning - A novel approach to feature selection in text categorization [J].
Doan, S ;
Horiguchi, S .
SOFT COMPUTING AS TRANSDISCIPLINARY SCIENCE AND TECHNOLOGY, 2005, :1043-1052
[44]   A global-ranking local feature selection method for text categorization [J].
Pinheiro, Roberto H. W. ;
Cavalcanti, George D. C. ;
Correa, Renato F. ;
Ren, Tsang Ing .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (17) :12851-12857
[45]   Best terms: an efficient feature-selection algorithm for text categorization [J].
Dimitris Fragoudis ;
Dimitris Meretakis ;
Spiridon Likothanassis .
Knowledge and Information Systems, 2005, 8 :16-33
[46]   Best terms: an efficient feature-selection algorithm for text categorization [J].
Fragoudis, D ;
Meretakis, D ;
Likothanassis, S .
KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 8 (01) :16-33
[47]   A Clustering Based Feature Selection Method Using Feature Information Distance for Text Data [J].
Chao, Shilong ;
Cai, Jie ;
Yang, Sheng ;
Wang, Shulin .
INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT I, 2016, 9771 :122-132
[48]   A NEW FEATURE SELECTION METHOD FOR TEXT CATEGORIZATION BASED ON INFORMATION GAIN AND PARTICLE SWARM OPTIMIZATION [J].
Yigit, Ferruh ;
Baykan, Omer Kaan .
2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, :523-529
[49]   Information gain and divergence-based feature selection for machine learning-based text categorization [J].
Lee, CK ;
Lee, GG .
INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (01) :155-165
[50]   Chi-square Statistics Feature Selection Based on Term Frequency and Distribution for Text Categorization [J].
Jin, Chuanxin ;
Ma, Tinghuai ;
Hou, Rongtao ;
Tang, Meili ;
Tian, Yuan ;
Al-Dhelaan, Abdullah ;
Al-Rodhaan, Mznah .
IETE JOURNAL OF RESEARCH, 2015, 61 (04) :351-362