Feature selection using a sinusoidal sequence combined with mutual information

被引:7
作者
Yuan, Gaoteng [1 ]
Lu, Lu [2 ]
Zhou, Xiaofeng [1 ]
机构
[1] Hohai Univ, Coll Comp & Informat, Nanjing 211100, Peoples R China
[2] Lib Nanjing Forestry Univ, Nanjing 210037, Peoples R China
关键词
Feature selection; Mutual information; Sinusoidal sequence; High-dimensional data; SSMI algorithm; FEATURE-EXTRACTION; CLASSIFICATION; ALGORITHM; FILTER;
D O I
10.1016/j.engappai.2023.107168
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data classification is the most common task in machine learning, and feature selection is the key step in the classification task. Common feature selection methods mainly analyze the maximum correlation and minimum redundancy between feature factors and tags while ignoring the impact of the number of key features, which will inevitably lead to waste in subsequent classification training. To solve this problem, a feature selection algorithm (SSMI) based on the combination of sinusoidal sequences and mutual information is proposed. First, the mutual information between each feature and tag is calculated, and the interference information in high-dimensional data is removed according to the mutual information value. Second, a sine function is constructed, and sine ordering is carried out according to the mutual information value and feature mean value between different categories of the same feature. By adjusting the period and phase value of the sequence, the feature set with the largest difference is found, and the subset of key features is obtained. Finally, three machine learning classifiers (KNN, RF, SVM) are used to classify key feature subsets, and several feature selection algorithms (JMI, mRMR, CMIM, SFS, etc.) are compared to verify the advantages and disadvantages of different algorithms. Compared with other feature selection methods, the SSMI algorithm obtains the least number of key features, with an average reduction of 15 features. The average classification accuracy has been improved by 3% on the KNN classifier. On the HBV and SDHR datasets, the SSMI algorithm achieved classification accuracy of 81.26% and 83.12%, with sensitivity and specificity results of 76.28%, 87.39% and 68.14%, 86.11%, respectively. This shows that the SSMI algorithm can achieve higher classification accuracy with a smaller feature subset.
引用
收藏
页数:13
相关论文
共 51 条
  • [31] Non-unique decision differential entropy-based feature selection
    Qu, Yanpeng
    Li, Rong
    Deng, Ansheng
    Shang, Changjing
    Shen, Qiang
    [J]. NEUROCOMPUTING, 2020, 393 : 187 - 193
  • [32] Reshef YA, 2016, J MACH LEARN RES, V17, P1
  • [33] Review of swarm intelligence-based feature selection methods
    Rostami, Mehrdad
    Berahmand, Kamal
    Nasiri, Elahe
    Forouzande, Saman
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 100
  • [34] TAGA: Tabu Asexual Genetic Algorithm embedded in a filter/filter feature selection approach for high-dimensional data
    Salesi, Sadegh
    Cosma, Georgina
    Mavrovouniotis, Michalis
    [J]. INFORMATION SCIENCES, 2021, 565 : 105 - 127
  • [35] Simultaneous feature selection and discretization based on mutual information
    Sharmin, Sadia
    Shoyaib, Mohammad
    Ali, Amin Ahsan
    Khan, Muhammad Asif Hossain
    Chae, Oksam
    [J]. PATTERN RECOGNITION, 2019, 91 : 162 - 174
  • [36] New algorithms for trace-ratio problem with application to high-dimension and large-sample data dimensionality reduction
    Shi, Wenya
    Wu, Gang
    [J]. MACHINE LEARNING, 2024, 113 (07) : 3889 - 3916
  • [37] Variable-Size Cooperative Coevolutionary Particle Swarm Optimization for Feature Selection on High-Dimensional Data
    Song, Xian-Fang
    Zhang, Yong
    Guo, Yi-Nan
    Sun, Xiao-Yan
    Wang, Yong-Li
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2020, 24 (05) : 882 - 895
  • [38] High-order conditional mutual information maximization for dealing with high-order dependencies in feature selection
    Souza, Francisco
    Premebida, Cristiano
    Araujo, Rui
    [J]. PATTERN RECOGNITION, 2022, 131
  • [39] Feature selection for IoT based on maximal information coefficient
    Sun, Guanglu
    Li, Jiabin
    Dai, Jian
    Song, Zhichao
    Lang, Fei
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 89 : 606 - 616
  • [40] Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems
    Sun, Lin
    Yin, Tengyu
    Ding, Weiping
    Qian, Yuhua
    Xu, Jiucheng
    [J]. INFORMATION SCIENCES, 2020, 537 : 401 - 424