Fuzzy information decomposition incorporated and weighted Relief-F feature selection: When imbalanced data meet incompletion

被引:18
作者
Dou, Jun [1 ]
Song, Yan [1 ]
Wei, Guoliang [2 ]
Zhang, Yameng [1 ]
机构
[1] Univ Shanghai Sci & Technol, Dept Control Sci & Engn, Shanghai 200093, Peoples R China
[2] Univ Shanghai Sci & Technol, Coll Sci, Shanghai Key Lab Modern Opt Syst, Shanghai 200093, Peoples R China
关键词
Imbalanced class; Incomplete data; Feature selection; Weighted Relief-F; Fuzzy information decomposition; MISSING DATA IMPUTATION; MACHINE; SMOTE;
D O I
10.1016/j.ins.2021.10.057
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data classification is an important computer task in data analysis, which suffers seriously unknown features, imbalanced class, and incomplete data. However, despite their vital yet practical significance, few results have been made on such three distinct issues. To address this problem, we propose a novel feature selection method for the data subject to incom-plete data and imbalanced class, namely, improved fuzzy information decomposition (IFID) incorporated and weighted Relief-F (WRelief-F) feature selection. The main idea of the pro-posed feature selection method is threefold. (1) The proposed IFID algorithm can deal with the imbalanced class and incomplete data at the same time. (2) In IFID, a new membership function is provided to reflect the influence of the observed data on the missing values appropriately. Based on this establishment, a more delicate information decomposition is adopted to make a better recovery than the traditional FID. (3) After using IFID, WRelief-F is put forward to take the relationship of the target instance to inter-class instances and the intra-class instances into consideration in a proper manner. Finally, experiments on the seven public data sets are utilized to show the effectiveness and uni-versal applicability of the proposed feature selection algorithm. (c) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:417 / 432
页数:16
相关论文
共 45 条
  • [1] [Anonymous], 2014, STAT ANAL MISSING DA
  • [2] Barua S, 2011, LECT NOTES COMPUT SC, V7063, P735, DOI 10.1007/978-3-642-24958-7_85
  • [3] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [4] RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise
    Chen, Baiyun
    Xia, Shuyin
    Chen, Zizhong
    Wang, Binggui
    Wang, Guoyin
    [J]. INFORMATION SCIENCES, 2021, 553 : 397 - 428
  • [5] Improved CBSO: A distributed fuzzy-based adaptive synthetic oversampling algorithm for imbalanced judicial data
    Dai, Feifan
    Song, Yan
    Si, Weiyun
    Yang, Guisong
    Hu, Jianhua
    Wang, Xinli
    [J]. INFORMATION SCIENCES, 2021, 569 : 70 - 89
  • [6] Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
    Douzas, Georgios
    Bacao, Fernando
    Last, Felix
    [J]. INFORMATION SCIENCES, 2018, 465 : 1 - 20
  • [7] Dubois D.J., 2014, Readings in fuzzy sets for intelligent systems
  • [8] A novel framework for imputation of missing values in databases
    Farhangfar, Alireza
    Kurgan, Lukasz A.
    Pedrycz, Witold
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2007, 37 (05): : 692 - 709
  • [9] Additive logistic regression: A statistical view of boosting - Rejoinder
    Friedman, J
    Hastie, T
    Tibshirani, R
    [J]. ANNALS OF STATISTICS, 2000, 28 (02) : 400 - 407
  • [10] Learning from Imbalanced Data
    He, Haibo
    Garcia, Edwardo A.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) : 1263 - 1284