Synthetic minority oversampling using edited displacement-based k-nearest neighbors

被引:24
|
作者
Wang, Alex X. [1 ]
Chukova, Stefanka S. [1 ]
Nguyen, Binh P. [1 ]
机构
[1] Victoria Univ Wellington, Sch Math & Stat, Wellington 6012, New Zealand
关键词
Imbalance classification; Hybrid resampling; SMOTE; Centroid displacement; k-NN; IMBALANCED DATA; SMOTE; CLASSIFICATION; IDENTIFICATION;
D O I
10.1016/j.asoc.2023.110895
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skewed class proportions in real-world datasets present a challenge for machine learning algorithms, as they have a tendency to correctly categorize the majority class while incorrectly classifying the minority class. Such classification disparities hold significant implications, particularly in predictive scenarios involving minority groups, where misclassifying minority instances could lead to adverse outcomes. To tackle this, class imbalance learning has gained attention, with the Synthetic Minority Oversampling Technique (SMOTE) being a notable approach that addresses class imbalance by generating synthetic instances for the minority class based on their feature space neighbors. Despite its effectiveness and simplicity, SMOTE is known to suffer from a noise propagation issue where noisy and uninformative samples are introduced. While various SMOTE variants, including hybrids with undersampling, have been developed to tackle this problem, identifying noisy samples in complex real-world datasets remains a challenge. To address this, our study introduces a new SMOTE-based hybrid approach called SMOTE-centroid displacement-based k-NN (SMOTE-CDNN). SMOTE-CDNN employs centroid displacement for class prediction, which is more robust against noisy data. After SMOTE is applied, noise instances are detected and removed for clearer decision boundaries if their labels predicted by our centroid displacement-based k-NN algorithm are different from the real ones. While our experiments on 24 imbalance datasets demonstrate the resilience and efficiency of our proposed algorithm, which outperforms state-of-art resampling algorithms with various classification models, we acknowledge the need for further investigation into specific dataset characteristics and classification scenarios to determine the generalizability of our approach.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Implementation and Analysis of Centroid Displacement-Based k-Nearest Neighbors
    Wang, Alex X.
    Chukova, Stefanka S.
    Nguyen, Binh P.
    ADVANCED DATA MINING AND APPLICATIONS (ADMA 2022), PT I, 2022, 13725 : 431 - 443
  • [2] Ensemble k-nearest neighbors based on centroid displacement
    Wang, Alex X.
    Chukova, Stefanka S.
    Nguyen, Binh P.
    INFORMATION SCIENCES, 2023, 629 : 313 - 323
  • [3] Oversampling by genetic algorithm and k-nearest neighbors for network intrusion problem
    Jindaluang, Wattana
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (03) : 2515 - 2528
  • [4] Research on the multifractal volatility of Chinese banks based on the synthetic minority oversampling technique, edited nearest neighbors and long short-term memory
    Lochwitz, S. Tefan B.
    JOURNAL OF RISK MODEL VALIDATION, 2024, 18 (03): : 27 - 52
  • [5] Improved Oversampling Algorithm for Imbalanced Data Based on K-Nearest Neighbor and Interpolation Process Optimization
    Chen, Yiheng
    Zou, Jinbai
    Liu, Lihai
    Hu, Chuanbo
    SYMMETRY-BASEL, 2024, 16 (03):
  • [6] Human Sleep Scoring Based on K-Nearest Neighbors
    Qureshi, Shahnawaz
    Karrila, Seppo
    Vanichayobon, Sirirut
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2018, 26 (06) : 2802 - +
  • [7] Chameleon algorithm based on mutual k-nearest neighbors
    Zhang, Yuru
    Ding, Shifei
    Wang, Lijuan
    Wang, Yanru
    Ding, Ling
    APPLIED INTELLIGENCE, 2021, 51 (04) : 2031 - 2044
  • [8] Diagnosis of Melanocytic Lesions using the K-Nearest Neighbors classification technique
    Vazquez Noguera, Jose Luis
    Pinto-Roa, Diego P.
    Franco, Alba
    Martinez, Leticia
    Nunez, Carlos
    2022 17TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2022,
  • [9] K-nearest neighbors clustering algorithm
    Gauza, Dariusz
    Zukowska, Anna
    Nowak, Robert
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2014, 2014, 9290
  • [10] Compressed kNN: K-Nearest Neighbors with Data Compression
    Salvador-Meneses, Jaime
    Ruiz-Chavez, Zoila
    Garcia-Rodriguez, Jose
    ENTROPY, 2019, 21 (03)