Synthetic minority oversampling using edited displacement-based k-nearest neighbors

被引:24
作者
Wang, Alex X. [1 ]
Chukova, Stefanka S. [1 ]
Nguyen, Binh P. [1 ]
机构
[1] Victoria Univ Wellington, Sch Math & Stat, Wellington 6012, New Zealand
关键词
Imbalance classification; Hybrid resampling; SMOTE; Centroid displacement; k-NN; IMBALANCED DATA; SMOTE; CLASSIFICATION; IDENTIFICATION;
D O I
10.1016/j.asoc.2023.110895
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skewed class proportions in real-world datasets present a challenge for machine learning algorithms, as they have a tendency to correctly categorize the majority class while incorrectly classifying the minority class. Such classification disparities hold significant implications, particularly in predictive scenarios involving minority groups, where misclassifying minority instances could lead to adverse outcomes. To tackle this, class imbalance learning has gained attention, with the Synthetic Minority Oversampling Technique (SMOTE) being a notable approach that addresses class imbalance by generating synthetic instances for the minority class based on their feature space neighbors. Despite its effectiveness and simplicity, SMOTE is known to suffer from a noise propagation issue where noisy and uninformative samples are introduced. While various SMOTE variants, including hybrids with undersampling, have been developed to tackle this problem, identifying noisy samples in complex real-world datasets remains a challenge. To address this, our study introduces a new SMOTE-based hybrid approach called SMOTE-centroid displacement-based k-NN (SMOTE-CDNN). SMOTE-CDNN employs centroid displacement for class prediction, which is more robust against noisy data. After SMOTE is applied, noise instances are detected and removed for clearer decision boundaries if their labels predicted by our centroid displacement-based k-NN algorithm are different from the real ones. While our experiments on 24 imbalance datasets demonstrate the resilience and efficiency of our proposed algorithm, which outperforms state-of-art resampling algorithms with various classification models, we acknowledge the need for further investigation into specific dataset characteristics and classification scenarios to determine the generalizability of our approach.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] A K-nearest neighbors-based classification approach for automated detection of knee osteoarthritis
    Cengizler, Caglar
    Kabakci, Ayse Gul
    CUKUROVA MEDICAL JOURNAL, 2023, 48 (02): : 715 - 722
  • [32] A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning
    Saadatfar, Hamid
    Khosravi, Samiyeh
    Joloudari, Javad Hassannataj
    Mosavi, Amir
    Shamshirband, Shahaboddin
    MATHEMATICS, 2020, 8 (02)
  • [33] Scholarship Recipients Prediction Model using k-Nearest Neighbor Algorithm and Synthetic Minority Over-sampling Technique
    Kurniadi, Dede
    Nuraeni, Fitri
    Abania, Nia
    Fitriani, Leni
    Mulyani, Asri
    Agustin, Yoga Handoko
    2022 12TH INTERNATIONAL CONFERENCE ON SYSTEM ENGINEERING AND TECHNOLOGY (ICSET 2022), 2022, : 89 - 94
  • [34] A new approach for increasing K-nearest neighbors performance
    Aamer, Youssef
    Benkaouz, Yahya
    Ouzzif, Mohammed
    Bouragba, Khalid
    2020 8TH INTERNATIONAL CONFERENCE ON WIRELESS NETWORKS AND MOBILE COMMUNICATIONS (WINCOM 2020), 2020, : 35 - 39
  • [35] k-Nearest Neighbors for automated classification of celestial objects
    LiLi Li
    YanXia Zhang
    YongHeng Zhao
    Science in China Series G: Physics, Mechanics and Astronomy, 2008, 51 : 916 - 922
  • [36] Improving the speed and stability of the k-nearest neighbors method
    Beliakov, Gleb
    Li, Gang
    PATTERN RECOGNITION LETTERS, 2012, 33 (10) : 1296 - 1301
  • [37] Dynamic K-Nearest Neighbors For The Monitoring Of Evolving Systems
    Hartert, L.
    Mouchaweh, M. Sayed
    Billaudel, P.
    2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
  • [38] RECOGNIZING STEEL ELEMENTS WITH BRDF AND K-NEAREST NEIGHBORS
    Ciszkiewicz, Adam
    Jaglarz, Janusz
    Uhl, Tadeusz
    METROLOGY AND MEASUREMENT SYSTEMS, 2023, 30 (04) : 721 - 736
  • [39] Diminishing Prototype Size for k-Nearest Neighbors Classification
    Samadpour, Mohammad Mehdi
    Parvin, Hamid
    Rad, Farhad
    2015 FOURTEENTH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (MICAI), 2015, : 139 - 144
  • [40] k-Nearest Neighbors for automated classification of celestial objects
    LI LiLi1
    2 Department of Physics
    3 Weishanlu Middle School
    Science China(Physics,Mechanics & Astronomy), 2008, (07) : 916 - 922