A Constructive Method for Data Reduction and Imbalanced Sampling

被引:0
作者
Liu, Fei [1 ]
Yan, Yuanting [1 ]
机构
[1] Anhui Univ, Artificial Intelligence Inst, Sch Comp Sci & Technol, Hefei 230601, Anhui, Peoples R China
来源
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT III | 2024年 / 14489卷
基金
中国国家自然科学基金;
关键词
constructive covering algorithm; data reduction; undersampling; class imbalance; INSTANCE SELECTION; CLASSIFICATION;
D O I
10.1007/978-981-97-0798-0_28
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A large number of training data lead to high computational cost in instanced-based classification. Currently, one of the mainstream methods to reduce data size is to select a representative subset of samples based on spatial partitioning. However, how to select a representative subset while maintaining the overall potential distribution structure of the dataset remains a challenge. Therefore, this paper proposes a constructive data reduction method called Constructive Covering Sampling (CCS) for classification problems. The CCS does not rely on any relevant parameters. It iteratively partitions the original data space into a group of data subspaces, which contains several samples of the same class, and then it selects representative samples from the data subspaces. This not only maintains the original data distribution structure and reduces data size but also reduces problem complexity and improves the learning efficiency of the classifier. Furthermore, CCS can also be extended as an effective undersampling method (CCUS) to address class imbalance issues. Experiments on 18 KEEL and UCI datasets demonstrate that the proposed method outperforms other sampling methods in terms of F-measure, G-mean, AUC and Accuracy.
引用
收藏
页码:476 / 489
页数:14
相关论文
共 50 条
  • [41] Cluster-based sampling of multiclass imbalanced data
    Prachuabsupakij, Wanthanee
    Soonthornphisaj, Nuanwan
    INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1109 - 1135
  • [42] Multi-granularity relabeled under-sampling algorithm for imbalanced data
    Dai, Qi
    Liu, Jian-wei
    Liu, Yang
    APPLIED SOFT COMPUTING, 2022, 124
  • [43] Model-Based Synthetic Sampling for Imbalanced Data
    Liu, Chien-Liang
    Hsieh, Po-Yen
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (08) : 1543 - 1556
  • [44] Cluster-based under-sampling approaches for imbalanced data distributions
    Yen, Show-Jane
    Lee, Yue-Shi
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5718 - 5727
  • [45] A design of information granule-based under-sampling method in imbalanced data classification
    Tianyu Liu
    Xiubin Zhu
    Witold Pedrycz
    Zhiwu Li
    Soft Computing, 2020, 24 : 17333 - 17347
  • [46] A design of information granule-based under-sampling method in imbalanced data classification
    Liu, Tianyu
    Zhu, Xiubin
    Pedrycz, Witold
    Li, Zhiwu
    SOFT COMPUTING, 2020, 24 (22) : 17333 - 17347
  • [47] An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md.
    2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
  • [48] An adaptive over-sampling method for imbalanced data based on simultaneous clustering and filtering noisy
    Chen, Wei
    Guo, Wenjie
    Mao, Weijie
    APPLIED INTELLIGENCE, 2024, 54 (22) : 11430 - 11449
  • [49] A resistance outlier sampling algorithm for imbalanced data prediction
    Pan, Xiaoying
    Jia, Rong
    Huang, Jiahao
    Wang, Hao
    INTELLIGENT DATA ANALYSIS, 2022, 26 (03) : 583 - 598
  • [50] The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data
    Justin M. Johnson
    Taghi M. Khoshgoftaar
    Information Systems Frontiers, 2020, 22 : 1113 - 1131