A Comparison Study of Cost-sensitive Learning and Sampling Methods on Imbalanced Data Sets

被引:4
作者
Zhang, Jinwei [1 ]
Lu, Huijuan [1 ]
Chen, Wutao [1 ]
Lu, Yi [2 ]
机构
[1] China Jiliang Univ, Coll Informat Engn, Hangzhou 310018, Zhejiang, Peoples R China
[2] Prairie View A&M Univ, Dept Comp Sci, Prairie View, TX 77446 USA
来源
ADVANCED MATERIALS AND INFORMATION TECHNOLOGY PROCESSING, PTS 1-3 | 2011年 / 271-273卷
基金
浙江省自然科学基金; 中国国家自然科学基金;
关键词
misclassification cost; cost-sensitive learning; over-sampling; under-sampling;
D O I
10.4028/www.scientific.net/AMR.271-273.1291
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
The classifier, built from a highly-skewed class distribution data set, generally predicts an unknown sample as the majority class much more frequently than the minority class. This is due to the fact that the aim of classifier is designed to get the highest classification accuracy. We compare three classification methods dealing with the data sets in which class distribution is imbalanced and has non-uniform misclassification cost, namely cost-sensitive learning method whose misclassification cost is embedded in the algorithm, over-sampling method and under-sampling method. In this paper, we compare these three methods to determine which one will produce the best overall classification under any circumstance. We have the following conclusion: 1. Cost-sensitive learning is suitable for the classification of imbalanced dataset. It outperforms sampling methods overall, and is more stable than sampling methods except the condition that data set is quite small. 2. If the dataset is highly skewed or quite small, over-sampling methods may be better.
引用
收藏
页码:1291 / +
页数:3
相关论文
共 50 条
  • [31] Constraint relaxation, cost-sensitive learning and bagging for imbalanced classification problems with outliers
    Talayeh Razzaghi
    Petros Xanthopoulos
    Onur Şeref
    [J]. Optimization Letters, 2017, 11 : 915 - 928
  • [32] Multi-view cost-sensitive kernel learning for imbalanced classification problem
    Tang, Jingjing
    Hou, Zhaojie
    Yu, Xiaotong
    Fu, Saiji
    Tian, Yingjie
    [J]. NEUROCOMPUTING, 2023, 552
  • [33] Cost-Sensitive Learning from Imbalanced Datasets for Retail Credit Risk Assessment
    Oreski, Stjepan
    Oreski, Goran
    [J]. TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2018, 7 (01): : 59 - 73
  • [34] Reinforcement learning-based cost-sensitive classifier for imbalanced fault classification
    Xinmin Zhang
    Saite Fan
    Zhihuan Song
    [J]. Science China Information Sciences, 2023, 66
  • [35] Focused Anchors Loss: cost-sensitive learning of discriminative features for imbalanced classification
    Baloch, Bahram K.
    Kumar, Sateesh
    Haresh, Sanjay
    Rehman, Abeerah
    Syed, Tahir
    [J]. ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 837 - 850
  • [36] Reinforcement learning-based cost-sensitive classifier for imbalanced fault classification
    Zhang, Xinmin
    Fan, Saite
    Song, Zhihuan
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2023, 66 (11)
  • [37] Cost-Sensitive Broad Learning System for Imbalanced Classification and Its Medical Application
    Yao, Liang
    Wong, Pak Kin
    Zhao, Baoliang
    Wang, Ziwen
    Lei, Long
    Wang, Xiaozheng
    Hu, Ying
    [J]. MATHEMATICS, 2022, 10 (05)
  • [38] Cost-Sensitive Learning of Fuzzy Rules for Imbalanced Classification Problems Using FURIA
    Palacios, Ana
    Trawinski, Krzysztof
    Cordon, Oscar
    Sanchez, Luciano
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2014, 22 (05) : 643 - 675
  • [39] Constraint relaxation, cost-sensitive learning and bagging for imbalanced classification problems with outliers
    Razzaghi, Talayeh
    Xanthopoulos, Petros
    Seref, Onur
    [J]. OPTIMIZATION LETTERS, 2017, 11 (05) : 915 - 928
  • [40] COST-SENSITIVE SPARSE LINEAR REGRESSION FOR CROWD COUNTING WITH IMBALANCED TRAINING DATA
    Huang, Xiaolin
    Zou, Yuexian
    Wang, Yi
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,