Fuzzy information decomposition incorporated and weighted Relief-F feature selection: When imbalanced data meet incompletion

被引：18

作者：

Dou, Jun ^{[1
]}

Song, Yan ^{[1
]}

Wei, Guoliang ^{[2
]}

Zhang, Yameng ^{[1
]}

机构：

[1] Univ Shanghai Sci & Technol, Dept Control Sci & Engn, Shanghai 200093, Peoples R China

[2] Univ Shanghai Sci & Technol, Coll Sci, Shanghai Key Lab Modern Opt Syst, Shanghai 200093, Peoples R China

来源：

INFORMATION SCIENCES | 2022年 / 584卷

关键词：

Imbalanced class; Incomplete data; Feature selection; Weighted Relief-F; Fuzzy information decomposition; MISSING DATA IMPUTATION; MACHINE; SMOTE;

D O I：

10.1016/j.ins.2021.10.057

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Data classification is an important computer task in data analysis, which suffers seriously unknown features, imbalanced class, and incomplete data. However, despite their vital yet practical significance, few results have been made on such three distinct issues. To address this problem, we propose a novel feature selection method for the data subject to incom-plete data and imbalanced class, namely, improved fuzzy information decomposition (IFID) incorporated and weighted Relief-F (WRelief-F) feature selection. The main idea of the pro-posed feature selection method is threefold. (1) The proposed IFID algorithm can deal with the imbalanced class and incomplete data at the same time. (2) In IFID, a new membership function is provided to reflect the influence of the observed data on the missing values appropriately. Based on this establishment, a more delicate information decomposition is adopted to make a better recovery than the traditional FID. (3) After using IFID, WRelief-F is put forward to take the relationship of the target instance to inter-class instances and the intra-class instances into consideration in a proper manner. Finally, experiments on the seven public data sets are utilized to show the effectiveness and uni-versal applicability of the proposed feature selection algorithm. (c) 2021 Elsevier Inc. All rights reserved.

引用

页码：417 / 432

页数：16

共 45 条

[1] [Anonymous], 2014, STAT ANAL MISSING DA
[2] Barua S, 2011, LECT NOTES COMPUT SC, V7063, P735, DOI 10.1007/978-3-642-24958-7_85
[3] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[4] RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise
Chen, Baiyun
Xia, Shuyin
Chen, Zizhong
Wang, Binggui
Wang, Guoyin
[J]. INFORMATION SCIENCES, 2021, 553 : 397 - 428
[5] Improved CBSO: A distributed fuzzy-based adaptive synthetic oversampling algorithm for imbalanced judicial data
Dai, Feifan
Song, Yan
Si, Weiyun
Yang, Guisong
Hu, Jianhua
Wang, Xinli
[J]. INFORMATION SCIENCES, 2021, 569 : 70 - 89
[6] Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
Douzas, Georgios
Bacao, Fernando
Last, Felix
[J]. INFORMATION SCIENCES, 2018, 465 : 1 - 20
[7] Dubois D.J., 2014, Readings in fuzzy sets for intelligent systems
[8] A novel framework for imputation of missing values in databases
Farhangfar, Alireza
Kurgan, Lukasz A.
Pedrycz, Witold
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2007, 37 (05): : 692 - 709
[9] Additive logistic regression: A statistical view of boosting - Rejoinder
Friedman, J
Hastie, T
Tibshirani, R
[J]. ANNALS OF STATISTICS, 2000, 28 (02) : 400 - 407
[10] Learning from Imbalanced Data
He, Haibo
Garcia, Edwardo A.
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) : 1263 - 1284

← 1 2 3 4 5 →