Synthetic minority oversampling using edited displacement-based k-nearest neighbors

被引:24
|
作者
Wang, Alex X. [1 ]
Chukova, Stefanka S. [1 ]
Nguyen, Binh P. [1 ]
机构
[1] Victoria Univ Wellington, Sch Math & Stat, Wellington 6012, New Zealand
关键词
Imbalance classification; Hybrid resampling; SMOTE; Centroid displacement; k-NN; IMBALANCED DATA; SMOTE; CLASSIFICATION; IDENTIFICATION;
D O I
10.1016/j.asoc.2023.110895
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skewed class proportions in real-world datasets present a challenge for machine learning algorithms, as they have a tendency to correctly categorize the majority class while incorrectly classifying the minority class. Such classification disparities hold significant implications, particularly in predictive scenarios involving minority groups, where misclassifying minority instances could lead to adverse outcomes. To tackle this, class imbalance learning has gained attention, with the Synthetic Minority Oversampling Technique (SMOTE) being a notable approach that addresses class imbalance by generating synthetic instances for the minority class based on their feature space neighbors. Despite its effectiveness and simplicity, SMOTE is known to suffer from a noise propagation issue where noisy and uninformative samples are introduced. While various SMOTE variants, including hybrids with undersampling, have been developed to tackle this problem, identifying noisy samples in complex real-world datasets remains a challenge. To address this, our study introduces a new SMOTE-based hybrid approach called SMOTE-centroid displacement-based k-NN (SMOTE-CDNN). SMOTE-CDNN employs centroid displacement for class prediction, which is more robust against noisy data. After SMOTE is applied, noise instances are detected and removed for clearer decision boundaries if their labels predicted by our centroid displacement-based k-NN algorithm are different from the real ones. While our experiments on 24 imbalance datasets demonstrate the resilience and efficiency of our proposed algorithm, which outperforms state-of-art resampling algorithms with various classification models, we acknowledge the need for further investigation into specific dataset characteristics and classification scenarios to determine the generalizability of our approach.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] An Interval Valued K-Nearest Neighbors Classifier
    Derrac, Joaquin
    Chiclana, Francisco
    Garcia, Salvador
    Herrera, Francisco
    PROCEEDINGS OF THE 2015 CONFERENCE OF THE INTERNATIONAL FUZZY SYSTEMS ASSOCIATION AND THE EUROPEAN SOCIETY FOR FUZZY LOGIC AND TECHNOLOGY, 2015, 89 : 378 - 384
  • [22] Hypersphere anchor loss for K-Nearest neighbors
    Ye, Xiang
    He, Zihang
    Wang, Heng
    Li, Yong
    APPLIED INTELLIGENCE, 2023, 53 (24) : 30319 - 30328
  • [23] Anomaly Detection-based Spectrum Sensing using the k-Nearest Neighbors Algorithm
    Lopez-Lopez, Lizeth
    Andrade, Angel G.
    Galaviz, Guillermo
    2022 IEEE MEXICAN INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE (ENC), 2022,
  • [24] Feature Extraction, Selection, and K-Nearest Neighbors Algorithm for Shark Behavior Classification Based on Imbalanced Dataset
    Yang, Yu
    Yeh, Hen-Geul
    Zhang, Wenlu
    Lee, Calvin J.
    Meese, Emily N.
    Lowe, Christopher G.
    IEEE SENSORS JOURNAL, 2021, 21 (05) : 6429 - 6439
  • [25] Probability-Based Synthetic Minority Oversampling Technique
    Altwaijry, Najwa
    IEEE ACCESS, 2023, 11 : 28831 - 28839
  • [26] Weather Prediction and Classification Using Neural Networks and k-Nearest Neighbors
    Mantri, Rhea
    Raghavendra, Kulkarni Rakshit
    Puri, Harshita
    Chaudhary, Jhanavi
    Bingi, Kishore
    2021 8TH INTERNATIONAL CONFERENCE ON SMART COMPUTING AND COMMUNICATIONS (ICSCC), 2021, : 263 - 268
  • [27] ROBUST CFAR RADAR DETECTION USING A K-NEAREST NEIGHBORS RULE
    Coluccia, Angelo
    Fascista, Alessio
    Ricci, Giuseppe
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4692 - 4696
  • [28] Relative Angle Correction for Distance Estimation Using K-Nearest Neighbors
    Madray, Ian
    Suire, Jason
    Desforges, Jeremy
    Madani, Mohammad R.
    IEEE SENSORS JOURNAL, 2020, 20 (14) : 8155 - 8163
  • [29] Enhancing the Irish NFI using k-nearest neighbors and a genetic algorithm
    McInerney, Daniel
    Barrett, Frank
    McRoberts, Ronald E.
    Tomppo, Erkki
    CANADIAN JOURNAL OF FOREST RESEARCH, 2018, 48 (12) : 1482 - 1494
  • [30] A K-nearest neighbors-based classification approach for automated detection of knee osteoarthritis
    Cengizler, Caglar
    Kabakci, Ayse Gul
    CUKUROVA MEDICAL JOURNAL, 2023, 48 (02): : 715 - 722