Feature selection using Information Gain and decision information in neighborhood decision system

被引:27
作者
Qu, Kanglin [1 ,2 ]
Xu, Jiucheng [1 ,2 ]
Hou, Qincheng [1 ,2 ]
Qu, Kangjian [3 ]
Sun, Yuanhao [1 ,2 ]
机构
[1] Henan Normal Univ, Engn Technol Res Ctr Comp Intelligence & Data Min, Xinxiang 453007, Peoples R China
[2] Henan Normal Univ, Coll Comp & Informat Engn, Xinxiang 453007, Peoples R China
[3] Nanjing Inst Technol, Coll Comp Engn, Nanjing 210000, Peoples R China
基金
中国国家自然科学基金;
关键词
Neighborhood rough set; Entropy measures; Information Gain; Nonmonotonic algorithm; Feature selection; GENE SELECTION; REDUCTION; CLASSIFIER; ALGORITHM; ENTROPY;
D O I
10.1016/j.asoc.2023.110100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is a significant preprocessing technique for data mining, which can promote the accuracy of data classification and shrink feature space by eliminating redundant features. Since traditional feature selection algorithms have high time complexity and low classification accuracy, an effective algorithm using Information Gain and decision information is designed. The algorithm introduces Information Gain for performing preliminary dimensionality reduction on high dimensional datasets, and then the decision information is regarded as an evaluation function of features to select features with important information. First, the concept of joint information granule is defined, and neighborhood information entropy measures are proposed based on the joint information granule. In addition, the relationship between these measures is studied, which is helpful to study the uncertainty in data. Second, a nonmonotonic algorithm using the decision information in the neighborhood information entropy measures is proposed to overcome the shortcoming of algorithms based on monotonic evaluation functions, thereby improving the accuracy of data classification. Third, to reduce the time cost of the designed algorithm for high dimensional datasets, Information Gain is introduced to preliminarily eliminate irrelevant features in high dimensional datasets. Finally, the ablation and comparison experiments on twelve public datasets demonstrate the low time cost and high classification accuracy of our algorithm, respectively.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 51 条
[1]   Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments [J].
Apolloni, Javier ;
Leguizamon, Guillermo ;
Alba, Enrique .
APPLIED SOFT COMPUTING, 2016, 38 :922-932
[2]   A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data [J].
Aziz, Rabia ;
Verma, C. K. ;
Srivastava, Namita .
GENOMICS DATA, 2016, 8 :4-15
[3]   Double-quantitative multigranulation rough fuzzy set based on logical operations in multi-source decision systems [J].
Chen, Xiuwei ;
Xu, Weihua .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (04) :1021-1048
[4]   Attribute group for attribute reduction [J].
Chen, Yan ;
Liu, Keyu ;
Song, Jingjing ;
Fujita, Hamido ;
Yang, Xibei ;
Qian, Yuhua .
INFORMATION SCIENCES, 2020, 535 :64-80
[5]   Neighborhood rough set reduction with fish swarm algorithm [J].
Chen, Yumin ;
Zeng, Zhiqiang ;
Lu, Junwen .
SOFT COMPUTING, 2017, 21 (23) :6907-6918
[6]   Random sampling accelerator for attribute reduction [J].
Chen, Zhen ;
Liu, Keyu ;
Yang, Xibei ;
Fujita, Hamido .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2022, 140 :75-91
[7]   A novel hybrid genetic algorithm with granular information for feature selection and optimization [J].
Dong, Hongbin ;
Li, Tao ;
Ding, Rui ;
Sun, Jing .
APPLIED SOFT COMPUTING, 2018, 65 :33-46
[8]   MULTIPLE COMPARISONS AMONG MEANS [J].
DUNN, OJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1961, 56 (293) :52-&
[9]   Attribute reduction based on max-decision neighborhood rough set model [J].
Fan, Xiaodong ;
Zhao, Weida ;
Wang, Changzhong ;
Huang, Yang .
KNOWLEDGE-BASED SYSTEMS, 2018, 151 :16-23
[10]   A comparison of alternative tests of significance for the problem of m rankings [J].
Friedman, M .
ANNALS OF MATHEMATICAL STATISTICS, 1940, 11 :86-92