Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted K-nearest neighbors

被引:45
|
作者
Sun, Lin [1 ,3 ]
Zhang, Jiuxiao [1 ]
Ding, Weiping [2 ]
Xu, Jiucheng [1 ]
机构
[1] Henan Normal Univ, Coll Comp & Informat Engn, Xinxiang 453007, Henan, Peoples R China
[2] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
[3] Engn Lab Intelligence Business & Internet Things, Xinxiang 453007, Henan, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data classification; Feature selection; Symmetric uncertainty; Feature clustering; K-nearest neighbors; FEATURE-SELECTION; UNCERTAINTY MEASURES; INFORMATION; DENSITY;
D O I
10.1016/j.ins.2022.02.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most existing imbalanced data classification models mainly focus on the classification performance of majority class samples, and many clustering algorithms need to manually specify the initial cluster centers and the number of clusters. To solve these drawbacks, this study presents a novel feature reduction method for imbalanced data classification using similarity-based feature clustering with adaptive weighted k-nearest neighbors (AWKNN). First, the similarity between samples is evaluated by the difference and smaller value between samples on each dimension, a similarity measure matrix is then developed to measure the similarity between clusters, after which a new hierarchical clustering model is constructed. By combining the cluster center of each sample cluster with its nearest neighbor, new samples are generated. Then, a hybrid sampling model based on similarity measure is presented by putting the generated samples into imbalanced data and removing samples from majority classes. Thus, a balanced decision system is constructed based on generated samples and minority class samples. Second, to address the issues that the traditional symmetric uncertainty only considers the correlation between features, and mutual information ignores the added information after classification, the normalized information gain is introduced to design new symmetric uncertainty between each feature and the other features; then, the ordered sequence and the average of the symmetric uncertainty difference of each feature are provided to adaptively select the k-nearest neighbors of features. Moreover, the weight of the k-th nearest neighbor of features is defined to present the AWKNN density of features and their ordered sequence for clustering features. Finally, by combining the weighted average redundancy with the symmetric uncertainty between features and decision classes, the maximum relevance between each feature and decision classes, and the minimum redundancy among features in the same cluster is presented to select the optimal feature subset from the feature clusters. Experiments applied to 29 imbalanced datasets show that the developed algorithm is effective and can select the optimal feature subset with high classification accuracy for imbalanced data. (C) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:591 / 613
页数:23
相关论文
共 50 条
  • [1] Feature Extraction, Selection, and K-Nearest Neighbors Algorithm for Shark Behavior Classification Based on Imbalanced Dataset
    Yang, Yu
    Yeh, Hen-Geul
    Zhang, Wenlu
    Lee, Calvin J.
    Meese, Emily N.
    Lowe, Christopher G.
    IEEE SENSORS JOURNAL, 2021, 21 (05) : 6429 - 6439
  • [2] Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm
    Li, Shuangjie
    Zhang, Kaixiang
    Chen, Qianru
    Wang, Shuqin
    Zhang, Shaoqiang
    IEEE ACCESS, 2020, 8 : 139512 - 139528
  • [3] Improving k-Nearest Neighbors Algorithm for Imbalanced Data Classification
    Shi, Zhan
    3RD ANNUAL INTERNATIONAL CONFERENCE ON CLOUD TECHNOLOGY AND COMMUNICATION ENGINEERING, 2020, 719
  • [4] Density peaks clustering algorithm with K-nearest neighbors and weighted similarity
    Zhao J.
    Chen L.
    Wu R.-X.
    Zhang B.
    Han L.-Z.
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2022, 39 (12): : 2349 - 2357
  • [5] Optimal feature selection for a weighted k-nearest neighbors for compound fault classification in wind turbine gearbox
    Gbashi, Samuel M.
    Adedeji, Paul A.
    Olatunji, Obafemi O.
    Madushele, Nkosinathi
    RESULTS IN ENGINEERING, 2025, 25
  • [6] Using K Nearest Neighbors for Text Segmentation with Feature Similarity
    Jo, Taeho
    2017 INTERNATIONAL CONFERENCE ON COMMUNICATION, CONTROL, COMPUTING AND ELECTRONICS ENGINEERING (ICCCCEE), 2017,
  • [7] Locally Adaptive Text Classification based k-nearest Neighbors
    Yu, Xiao-gao
    Yu, Xiao-peng
    2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 5651 - +
  • [8] Weighted k-nearest neighbors feature selection for high-dimensional multi-class data
    Bugata, Peter
    Drotar, Peter
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3066 - 3073
  • [9] Active Antinoise Fuzzy Dominance Rough Feature Selection Using Adaptive K-Nearest Neighbors
    Sang, Binbin
    Xu, Weihua
    Chen, Hongmei
    Li, Tianrui
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2023, 31 (11) : 3944 - 3958
  • [10] Weighted K-nearest neighbors classification based on Whale optimization algorithm
    Anvari, S.
    Azgomi, M. Abdollahi
    Dishabi, M. R. Ebrahimi
    Maheri, M.
    IRANIAN JOURNAL OF FUZZY SYSTEMS, 2023, 20 (03): : 61 - 74