A novel feature selection framework for incomplete data

被引:0
|
作者
Guo, Cong [1 ]
Yang, Wei [1 ]
Li, Zheng [1 ]
Liu, Chun [1 ]
机构
[1] Henan Univ, Sch Comp & Informat Engn, Henan Key Lab Big Data Anal & Proc, Henan Engn Lab Spatial Informat Proc, Kaifeng 475004, Peoples R China
关键词
Feature selection; Incomplete data; ReliefF; MATRIX COMPLETION; MISSING VALUES; CLASSIFICATION;
D O I
10.1016/j.chemolab.2024.105193
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection on incomplete datasets is a challenging task. To address this challenge, existing methods first employ imputation methods to complete the dataset and then perform feature selection based on the imputed dataset. Since missing value imputation and feature selection are entirely independent, the importance of features cannot be considered during imputation. However, in real-world scenarios or datasets, different features have varying degrees of importance. To this end, we proposed a novel incomplete data feature selection framework that considers feature importance. The framework mainly consists of two alternating iterative stages: M-stage and W-stage. In the M-stage, missing values are imputed based on a given feature importance vector and multiple initial imputation results. In the W-stage, an improved reliefF algorithm is employed to learn the feature importance vector based on the imputed data. In particular, the feature importance output by the W-stage in the current iteration will be used as the input of the M-stage in the next iteration. Experimental results on artificial and real missing datasets demonstrate that the proposed method outperforms other approaches significantly.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Bagging and Feature Selection for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2017, PT I, 2017, 10199 : 471 - 486
  • [2] Online feature selection and classification with incomplete data
    Kalkan, Habil
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2014, 22 (06) : 1625 - 1636
  • [3] Improving performance for classification with incomplete data using wrapper-based feature selection
    Tran C.T.
    Zhang M.
    Andreae P.
    Xue B.
    Evolutionary Intelligence, 2016, 9 (3) : 81 - 94
  • [4] Improving performance of classification on incomplete data using feature selection and clustering
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    Lam Thu Bui
    APPLIED SOFT COMPUTING, 2018, 73 : 848 - 861
  • [5] Optimal and Novel Hybrid Feature Selection Framework for Effective Data Classification
    Venkataraman, Sivakumar
    Selvaraj, Rajalakshmi
    ADVANCES IN SYSTEMS, CONTROL AND AUTOMATION, 2018, 442 : 499 - 514
  • [6] Mutual information criterion for feature selection from incomplete data
    Qian, Wenbin
    Shu, Wenhao
    NEUROCOMPUTING, 2015, 168 : 210 - 220
  • [7] Feature Selection based on Discernibility Function in Incomplete Data with Fuzzy Decision
    Qian, Wenbin
    Shu, Wenhao
    Liu, Jun
    Wang, Yinglong
    2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2017), 2017, : 899 - 904
  • [8] Half-Quadratic Minimization for Unsupervised Feature Selection on Incomplete Data
    Shen, Heng Tao
    Zhu, Yonghua
    Zheng, Wei
    Zhu, Xiaofeng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (07) : 3122 - 3135
  • [9] A Classification Method for Incomplete Mixed Data Using Imputation and Feature Selection
    Li, Gengsong
    Zheng, Qibin
    Liu, Yi
    Li, Xiang
    Qin, Wei
    Diao, Xingchun
    APPLIED SCIENCES-BASEL, 2024, 14 (14):
  • [10] An incremental feature selection approach based on information entropy for incomplete data
    Luo, Chuan
    Li, Tianrui
    Yi, Zhang
    IEEE 17TH INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP / IEEE 17TH INT CONF ON PERVAS INTELLIGENCE AND COMP / IEEE 5TH INT CONF ON CLOUD AND BIG DATA COMP / IEEE 4TH CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2019, : 483 - 488