Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted K-nearest neighbors

被引：45

作者：

Sun, Lin ^{[1
,3
]}

Zhang, Jiuxiao ^{[1
]}

Ding, Weiping ^{[2
]}

Xu, Jiucheng ^{[1
]}

机构：

[1] Henan Normal Univ, Coll Comp & Informat Engn, Xinxiang 453007, Henan, Peoples R China

[2] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China

[3] Engn Lab Intelligence Business & Internet Things, Xinxiang 453007, Henan, Peoples R China

来源：

INFORMATION SCIENCES | 2022年 / 593卷

基金：

中国国家自然科学基金;

关键词：

Imbalanced data classification; Feature selection; Symmetric uncertainty; Feature clustering; K-nearest neighbors; FEATURE-SELECTION; UNCERTAINTY MEASURES; INFORMATION; DENSITY;

D O I：

10.1016/j.ins.2022.02.004

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Most existing imbalanced data classification models mainly focus on the classification performance of majority class samples, and many clustering algorithms need to manually specify the initial cluster centers and the number of clusters. To solve these drawbacks, this study presents a novel feature reduction method for imbalanced data classification using similarity-based feature clustering with adaptive weighted k-nearest neighbors (AWKNN). First, the similarity between samples is evaluated by the difference and smaller value between samples on each dimension, a similarity measure matrix is then developed to measure the similarity between clusters, after which a new hierarchical clustering model is constructed. By combining the cluster center of each sample cluster with its nearest neighbor, new samples are generated. Then, a hybrid sampling model based on similarity measure is presented by putting the generated samples into imbalanced data and removing samples from majority classes. Thus, a balanced decision system is constructed based on generated samples and minority class samples. Second, to address the issues that the traditional symmetric uncertainty only considers the correlation between features, and mutual information ignores the added information after classification, the normalized information gain is introduced to design new symmetric uncertainty between each feature and the other features; then, the ordered sequence and the average of the symmetric uncertainty difference of each feature are provided to adaptively select the k-nearest neighbors of features. Moreover, the weight of the k-th nearest neighbor of features is defined to present the AWKNN density of features and their ordered sequence for clustering features. Finally, by combining the weighted average redundancy with the symmetric uncertainty between features and decision classes, the maximum relevance between each feature and decision classes, and the minimum redundancy among features in the same cluster is presented to select the optimal feature subset from the feature clusters. Experiments applied to 29 imbalanced datasets show that the developed algorithm is effective and can select the optimal feature subset with high classification accuracy for imbalanced data. (C) 2022 Elsevier Inc. All rights reserved.

引用

页码：591 / 613

页数：23

共 50 条

[1] Feature Extraction, Selection, and K-Nearest Neighbors Algorithm for Shark Behavior Classification Based on Imbalanced Dataset
Yang, Yu
Yeh, Hen-Geul
Zhang, Wenlu
Lee, Calvin J.
Meese, Emily N.
Lowe, Christopher G.
IEEE SENSORS JOURNAL, 2021, 21 (05) : 6429 - 6439
[2] Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm
Li, Shuangjie
Zhang, Kaixiang
Chen, Qianru
Wang, Shuqin
Zhang, Shaoqiang
IEEE ACCESS, 2020, 8 : 139512 - 139528
[3] Improving k-Nearest Neighbors Algorithm for Imbalanced Data Classification
Shi, Zhan
3RD ANNUAL INTERNATIONAL CONFERENCE ON CLOUD TECHNOLOGY AND COMMUNICATION ENGINEERING, 2020, 719
[4] Density peaks clustering algorithm with K-nearest neighbors and weighted similarity
Zhao J.
Chen L.
Wu R.-X.
Zhang B.
Han L.-Z.
Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2022, 39 (12): : 2349 - 2357
[5] Optimal feature selection for a weighted k-nearest neighbors for compound fault classification in wind turbine gearbox
Gbashi, Samuel M.
Adedeji, Paul A.
Olatunji, Obafemi O.
Madushele, Nkosinathi
RESULTS IN ENGINEERING, 2025, 25
[6] Using K Nearest Neighbors for Text Segmentation with Feature Similarity
Jo, Taeho
2017 INTERNATIONAL CONFERENCE ON COMMUNICATION, CONTROL, COMPUTING AND ELECTRONICS ENGINEERING (ICCCCEE), 2017,
[7] Locally Adaptive Text Classification based k-nearest Neighbors
Yu, Xiao-gao
Yu, Xiao-peng
2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 5651 - +
[8] Weighted k-nearest neighbors feature selection for high-dimensional multi-class data
Bugata, Peter
Drotar, Peter
2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3066 - 3073
[9] Active Antinoise Fuzzy Dominance Rough Feature Selection Using Adaptive K-Nearest Neighbors
Sang, Binbin
Xu, Weihua
Chen, Hongmei
Li, Tianrui
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2023, 31 (11) : 3944 - 3958
[10] Weighted K-nearest neighbors classification based on Whale optimization algorithm
Anvari, S.
Azgomi, M. Abdollahi
Dishabi, M. R. Ebrahimi
Maheri, M.
IRANIAN JOURNAL OF FUZZY SYSTEMS, 2023, 20 (03): : 61 - 74

← 1 2 3 4 5 →