Feature selection using a sinusoidal sequence combined with mutual information

被引:7
作者
Yuan, Gaoteng [1 ]
Lu, Lu [2 ]
Zhou, Xiaofeng [1 ]
机构
[1] Hohai Univ, Coll Comp & Informat, Nanjing 211100, Peoples R China
[2] Lib Nanjing Forestry Univ, Nanjing 210037, Peoples R China
关键词
Feature selection; Mutual information; Sinusoidal sequence; High-dimensional data; SSMI algorithm; FEATURE-EXTRACTION; CLASSIFICATION; ALGORITHM; FILTER;
D O I
10.1016/j.engappai.2023.107168
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data classification is the most common task in machine learning, and feature selection is the key step in the classification task. Common feature selection methods mainly analyze the maximum correlation and minimum redundancy between feature factors and tags while ignoring the impact of the number of key features, which will inevitably lead to waste in subsequent classification training. To solve this problem, a feature selection algorithm (SSMI) based on the combination of sinusoidal sequences and mutual information is proposed. First, the mutual information between each feature and tag is calculated, and the interference information in high-dimensional data is removed according to the mutual information value. Second, a sine function is constructed, and sine ordering is carried out according to the mutual information value and feature mean value between different categories of the same feature. By adjusting the period and phase value of the sequence, the feature set with the largest difference is found, and the subset of key features is obtained. Finally, three machine learning classifiers (KNN, RF, SVM) are used to classify key feature subsets, and several feature selection algorithms (JMI, mRMR, CMIM, SFS, etc.) are compared to verify the advantages and disadvantages of different algorithms. Compared with other feature selection methods, the SSMI algorithm obtains the least number of key features, with an average reduction of 15 features. The average classification accuracy has been improved by 3% on the KNN classifier. On the HBV and SDHR datasets, the SSMI algorithm achieved classification accuracy of 81.26% and 83.12%, with sensitivity and specificity results of 76.28%, 87.39% and 68.14%, 86.11%, respectively. This shows that the SSMI algorithm can achieve higher classification accuracy with a smaller feature subset.
引用
收藏
页数:13
相关论文
共 51 条
  • [41] Feature selection using Lebesgue and entropy measures for incomplete neighborhood decision systems
    Sun, Lin
    Wang, Lanying
    Qian, Yuhua
    Xu, Jiucheng
    Zhang, Shiguang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 186
  • [42] Near-infrared spectroscopy for the distinction of wood and charcoal from Fabaceae species: comparison of ANN, KNN AND SVM models
    Vieira, Helena Cristina
    dos Santos, Joielan Xipaia
    Souza, Deivison Venicio
    Rios, Polliana D' Angelo
    Bolzon de Muniz, Graciela Ines
    Morrone, Simone Ribeiro
    Nisgoski, Silvana
    [J]. FOREST SYSTEMS, 2020, 29 (03) : 1 - 10
  • [43] Wanfu Gao, 2021, IEEE Transactions on Artificial Intelligence, V2, P584, DOI 10.1109/TAI.2021.3105084
  • [44] Feature Selection Based on Neighborhood Discrimination Index
    Wang, Changzhong
    Hu, Qinghua
    Wang, Xizhao
    Chen, Degang
    Qian, Yuhua
    Dong, Zhe
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (07) : 2986 - 2999
  • [45] Improved Approximation Algorithm for Maximal Information Coefficient
    Wang, Shuliang
    Zhao, Yiping
    Shu, Yue
    Shi, Wenzhong
    [J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2017, 13 (01) : 76 - 93
  • [46] Feature Selection With Maximal Relevance and Minimal Supervised Redundancy
    Wang, Yadi
    Li, Xiaoping
    Ruiz, Ruben
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (02) : 707 - 717
  • [47] Efficient and robust TWSVM classification via a minimum L1-norm distance metric criterion
    Yan, He
    Ye, Qiao-Lin
    Yu, Dong-Jun
    [J]. MACHINE LEARNING, 2019, 108 (06) : 993 - 1018
  • [48] Neighborhood rough sets with distance metric learning for feature selection
    Yang, Xiaoling
    Chen, Hongmei
    Li, Tianrui
    Wan, Jihong
    Sang, Binbin
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 224
  • [49] CSCIM_FS: Cosine similarity coefficient and information measurement criterion-based feature selection method for high-dimensional data
    Yuan, Gaoteng
    Zhai, Yi
    Tang, Jiansong
    Zhou, Xiaofeng
    [J]. NEUROCOMPUTING, 2023, 552
  • [50] Feature selection considering Uncertainty Change Ratio of the class label
    Zhang, Ping
    Gao, Wanfu
    [J]. APPLIED SOFT COMPUTING, 2020, 95