A novel feature selection framework for incomplete data

被引:0
|
作者
Guo, Cong [1 ]
Yang, Wei [1 ]
Li, Zheng [1 ]
Liu, Chun [1 ]
机构
[1] Henan Univ, Sch Comp & Informat Engn, Henan Key Lab Big Data Anal & Proc, Henan Engn Lab Spatial Informat Proc, Kaifeng 475004, Peoples R China
关键词
Feature selection; Incomplete data; ReliefF; MATRIX COMPLETION; MISSING VALUES; CLASSIFICATION;
D O I
10.1016/j.chemolab.2024.105193
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection on incomplete datasets is a challenging task. To address this challenge, existing methods first employ imputation methods to complete the dataset and then perform feature selection based on the imputed dataset. Since missing value imputation and feature selection are entirely independent, the importance of features cannot be considered during imputation. However, in real-world scenarios or datasets, different features have varying degrees of importance. To this end, we proposed a novel incomplete data feature selection framework that considers feature importance. The framework mainly consists of two alternating iterative stages: M-stage and W-stage. In the M-stage, missing values are imputed based on a given feature importance vector and multiple initial imputation results. In the W-stage, an improved reliefF algorithm is employed to learn the feature importance vector based on the imputed data. In particular, the feature importance output by the W-stage in the current iteration will be used as the input of the M-stage in the next iteration. Experimental results on artificial and real missing datasets demonstrate that the proposed method outperforms other approaches significantly.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Incremental feature selection based on rough set in dynamic incomplete data
    Shu, Wenhao
    Shen, Hong
    PATTERN RECOGNITION, 2014, 47 (12) : 3890 - 3906
  • [22] Genetic Programming based Feature Construction for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'17), 2017, : 1033 - 1040
  • [23] GP-based Feature Selection and Weighted KNN-based Instance Selection for Symbolic Regression with Incomplete Data
    Al-Helali, Baligh
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 905 - 912
  • [24] A novel feature selection scheme for high-dimensional data sets: four-Staged Feature Selection
    Pehlivanli, Ayca Cakmak
    JOURNAL OF APPLIED STATISTICS, 2016, 43 (06) : 1140 - 1154
  • [25] A Novel Subset Feature Selection Framework for Increasing the Classification Performance of SONAR Targets
    Potharaju, Sai Prasad
    Sreedevi, M.
    6TH INTERNATIONAL CONFERENCE ON SMART COMPUTING AND COMMUNICATIONS, 2018, 125 : 902 - 909
  • [26] A review of feature selection methods on synthetic data
    Bolon-Canedo, Veronica
    Sanchez-Marono, Noelia
    Alonso-Betanzos, Amparo
    KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) : 483 - 519
  • [27] An efficient feature selection framework based on information theory for high dimensional data
    Manikandan, G.
    Abirami, S.
    APPLIED SOFT COMPUTING, 2021, 111
  • [28] Feature Selection from Partially Uncertain Data Within the Belief Function Framework
    Trabelsi, Asma
    Elouedi, Zied
    Lefevre, Eric
    INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS, IPMU 2016, PT II, 2016, 611 : 643 - 655
  • [29] Improving survival prediction using a novel feature selection and feature reduction framework based on the integration of clinical and molecular data
    Neums, Lisa
    Meier, Richard
    Koestler, Devin C.
    Thompson, Jeffrey A.
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020, 2020, : 415 - 426
  • [30] Framework for the Ensemble of Feature Selection Methods
    Mera-Gaona, Maritza
    Lopez, Diego M.
    Vargas-Canas, Rubiel
    Neumann, Ursula
    APPLIED SCIENCES-BASEL, 2021, 11 (17):