Synthetic minority oversampling using edited displacement-based k-nearest neighbors

被引:24
作者
Wang, Alex X. [1 ]
Chukova, Stefanka S. [1 ]
Nguyen, Binh P. [1 ]
机构
[1] Victoria Univ Wellington, Sch Math & Stat, Wellington 6012, New Zealand
关键词
Imbalance classification; Hybrid resampling; SMOTE; Centroid displacement; k-NN; IMBALANCED DATA; SMOTE; CLASSIFICATION; IDENTIFICATION;
D O I
10.1016/j.asoc.2023.110895
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skewed class proportions in real-world datasets present a challenge for machine learning algorithms, as they have a tendency to correctly categorize the majority class while incorrectly classifying the minority class. Such classification disparities hold significant implications, particularly in predictive scenarios involving minority groups, where misclassifying minority instances could lead to adverse outcomes. To tackle this, class imbalance learning has gained attention, with the Synthetic Minority Oversampling Technique (SMOTE) being a notable approach that addresses class imbalance by generating synthetic instances for the minority class based on their feature space neighbors. Despite its effectiveness and simplicity, SMOTE is known to suffer from a noise propagation issue where noisy and uninformative samples are introduced. While various SMOTE variants, including hybrids with undersampling, have been developed to tackle this problem, identifying noisy samples in complex real-world datasets remains a challenge. To address this, our study introduces a new SMOTE-based hybrid approach called SMOTE-centroid displacement-based k-NN (SMOTE-CDNN). SMOTE-CDNN employs centroid displacement for class prediction, which is more robust against noisy data. After SMOTE is applied, noise instances are detected and removed for clearer decision boundaries if their labels predicted by our centroid displacement-based k-NN algorithm are different from the real ones. While our experiments on 24 imbalance datasets demonstrate the resilience and efficiency of our proposed algorithm, which outperforms state-of-art resampling algorithms with various classification models, we acknowledge the need for further investigation into specific dataset characteristics and classification scenarios to determine the generalizability of our approach.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] k-Nearest Neighbors for automated classification of celestial objects
    Li LiLi
    Zhang YanXia
    Zhao YongHeng
    SCIENCE IN CHINA SERIES G-PHYSICS MECHANICS & ASTRONOMY, 2008, 51 (07): : 916 - 922
  • [42] An Evidential k-nearest Neighbors Combination Rule for Tree Species Recognition
    Jendoubi, Siwar
    Coquin, Didier
    Boukezzoula, Reda
    BELIEF FUNCTIONS: THEORY AND APPLICATIONS, BELIEF 2018, 2018, 11069 : 129 - 136
  • [43] Frog Identification System Based on Local Means K-Nearest Neighbors with Fuzzy Distance Weighting
    Jaafar, Haryati
    Ramli, Dzati Athiar
    Rosdi, Bakhtiar Affendi
    Shahrudin, Shahriza
    8TH INTERNATIONAL CONFERENCE ON ROBOTIC, VISION, SIGNAL PROCESSING & POWER APPLICATIONS: INNOVATION EXCELLENCE TOWARDS HUMANISTIC TECHNOLOGY, 2014, 291 : 153 - 159
  • [44] POSTER: Scalable K-Nearest Neighbors Implementation using Distributed Embedded Systems
    De Sio, Corrado
    Avignone, Andrea
    Sterpone, Luca
    Chiusano, Silvia
    PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2024, CF 2024, 2024, : 314 - 315
  • [45] Quantum Algorithm for K-Nearest Neighbors Classification Based on the Metric of Hamming Distance
    Yue Ruan
    Xiling Xue
    Heng Liu
    Jianing Tan
    Xi Li
    International Journal of Theoretical Physics, 2017, 56 : 3496 - 3507
  • [46] Design Space Exploration for K-Nearest Neighbors Classification Using Stochastic Computing
    Cannisi, Dylan
    Yuan, Bo
    2016 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2016, : 321 - 326
  • [47] Quantum Algorithm for K-Nearest Neighbors Classification Based on the Metric of Hamming Distance
    Ruan, Yue
    Xue, Xiling
    Liu, Heng
    Tan, Jianing
    Li, Xi
    INTERNATIONAL JOURNAL OF THEORETICAL PHYSICS, 2017, 56 (11) : 3496 - 3507
  • [48] Coupling K-nearest Neighbors with Logistic Regression in Case-based Reasoning
    Campillo-Gimenez, Boris
    Bayat, Sahar
    Cuggia, Marc
    QUALITY OF LIFE THROUGH QUALITY OF INFORMATION, 2012, 180 : 275 - 279
  • [49] Evolutionary Feature Scaling in K-Nearest Neighbors Based on Label Dispersion Minimization
    Basak, Suryoday
    Huber, Manfred
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 928 - 935
  • [50] A New Version of the Dendritic Cell Immune Algorithm Based on the K-Nearest Neighbors
    Ben Ali, Kaouther
    Chelly, Zeineb
    Elouedi, Zied
    NEURAL INFORMATION PROCESSING, PT I, 2015, 9489 : 688 - 695