A Constructive Method for Data Reduction and Imbalanced Sampling

被引:0
|
作者
Liu, Fei [1 ]
Yan, Yuanting [1 ]
机构
[1] Anhui Univ, Artificial Intelligence Inst, Sch Comp Sci & Technol, Hefei 230601, Anhui, Peoples R China
来源
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT III | 2024年 / 14489卷
基金
中国国家自然科学基金;
关键词
constructive covering algorithm; data reduction; undersampling; class imbalance; INSTANCE SELECTION; CLASSIFICATION;
D O I
10.1007/978-981-97-0798-0_28
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A large number of training data lead to high computational cost in instanced-based classification. Currently, one of the mainstream methods to reduce data size is to select a representative subset of samples based on spatial partitioning. However, how to select a representative subset while maintaining the overall potential distribution structure of the dataset remains a challenge. Therefore, this paper proposes a constructive data reduction method called Constructive Covering Sampling (CCS) for classification problems. The CCS does not rely on any relevant parameters. It iteratively partitions the original data space into a group of data subspaces, which contains several samples of the same class, and then it selects representative samples from the data subspaces. This not only maintains the original data distribution structure and reduces data size but also reduces problem complexity and improves the learning efficiency of the classifier. Furthermore, CCS can also be extended as an effective undersampling method (CCUS) to address class imbalance issues. Experiments on 18 KEEL and UCI datasets demonstrate that the proposed method outperforms other sampling methods in terms of F-measure, G-mean, AUC and Accuracy.
引用
收藏
页码:476 / 489
页数:14
相关论文
共 50 条
  • [11] Entropy-based hybrid sampling (EHS) method to handle class overlap in highly imbalanced dataset
    Kumar, Anil
    Singh, Dinesh
    Yadav, Rama Shankar
    EXPERT SYSTEMS, 2024, 41 (11)
  • [12] An Effective Over-sampling Method for Imbalanced Data Sets Classification
    Zhai Yun
    Ma Nan
    Ruan Da
    An Bing
    CHINESE JOURNAL OF ELECTRONICS, 2011, 20 (03): : 489 - 494
  • [13] Under-sampling method based on sample weight for imbalanced data
    Xiong B.
    Wang G.
    Deng W.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2016, 53 (11): : 2613 - 2622
  • [14] Rough Sets in Imbalanced Data Problem: Improving Re-sampling Process
    Borowska, Katarzyna
    Stepaniuk, Jaroslaw
    COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT (CISIM 2017), 2017, 10244 : 459 - 469
  • [15] Aided Selection of Sampling Methods for Imbalanced Data Classification
    Sahni, Deep
    Pappu, Satya Jayadev
    Bhatt, Nirav
    CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 198 - 202
  • [16] The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    INFORMATION SYSTEMS FRONTIERS, 2020, 22 (05) : 1113 - 1131
  • [17] Severely imbalanced Big Data challenges: investigating data sampling approaches
    Tawfiq Hasanin
    Taghi M. Khoshgoftaar
    Joffrey L. Leevy
    Richard A. Bauder
    Journal of Big Data, 6
  • [18] A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data
    Popel, Mahmudul Hasan
    Hasib, Khan Md
    Habib, Syed Ahsan
    Shah, Faisal Muhammad
    2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,
  • [19] Deep Over-sampling Framework for Classifying Imbalanced Data
    Ando, Shin
    Huang, Chun Yuan
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 770 - 785
  • [20] Learning From Imbalanced Data With Deep Density Hybrid Sampling
    Liu, Chien-Liang
    Chang, Yu-Hua
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (11): : 7065 - 7077