A Comparison Study of Cost-sensitive Learning and Sampling Methods on Imbalanced Data Sets

被引:4
作者
Zhang, Jinwei [1 ]
Lu, Huijuan [1 ]
Chen, Wutao [1 ]
Lu, Yi [2 ]
机构
[1] China Jiliang Univ, Coll Informat Engn, Hangzhou 310018, Zhejiang, Peoples R China
[2] Prairie View A&M Univ, Dept Comp Sci, Prairie View, TX 77446 USA
来源
ADVANCED MATERIALS AND INFORMATION TECHNOLOGY PROCESSING, PTS 1-3 | 2011年 / 271-273卷
基金
浙江省自然科学基金; 中国国家自然科学基金;
关键词
misclassification cost; cost-sensitive learning; over-sampling; under-sampling;
D O I
10.4028/www.scientific.net/AMR.271-273.1291
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
The classifier, built from a highly-skewed class distribution data set, generally predicts an unknown sample as the majority class much more frequently than the minority class. This is due to the fact that the aim of classifier is designed to get the highest classification accuracy. We compare three classification methods dealing with the data sets in which class distribution is imbalanced and has non-uniform misclassification cost, namely cost-sensitive learning method whose misclassification cost is embedded in the algorithm, over-sampling method and under-sampling method. In this paper, we compare these three methods to determine which one will produce the best overall classification under any circumstance. We have the following conclusion: 1. Cost-sensitive learning is suitable for the classification of imbalanced dataset. It outperforms sampling methods overall, and is more stable than sampling methods except the condition that data set is quite small. 2. If the dataset is highly skewed or quite small, over-sampling methods may be better.
引用
收藏
页码:1291 / +
页数:3
相关论文
共 50 条
[41]   Efficient Utilization of Missing Data in Cost-Sensitive Learning [J].
Zhu, Xiaofeng ;
Yang, Jianye ;
Zhang, Chengyuan ;
Zhang, Shichao .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (06) :2425-2436
[42]   Cost-sensitive learning using logical analysis of data [J].
Osman, Hany .
KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (06) :3571-3606
[43]   AdaCC: cumulative cost-sensitive boosting for imbalanced classification [J].
Iosifidis, Vasileios ;
Papadopoulos, Symeon ;
Rosenhahn, Bodo ;
Ntoutsi, Eirini .
KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (02) :789-826
[44]   A batch-adapted cost-sensitive contrastive feature learning network for industrial diagnosis with extremely imbalanced data [J].
Liu, Yijin ;
Li, Zipeng ;
Chen, Jinglong ;
Zhang, Tianci ;
Pan, Tongyang ;
He, Shuilong .
MEASUREMENT, 2025, 244
[45]   A Composite Cost-Sensitive Neural Network for Imbalanced Classification [J].
Chen, Lei ;
Zhu, Yuan .
PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, :7264-7268
[46]   AdaCC: cumulative cost-sensitive boosting for imbalanced classification [J].
Vasileios Iosifidis ;
Symeon Papadopoulos ;
Bodo Rosenhahn ;
Eirini Ntoutsi .
Knowledge and Information Systems, 2023, 65 :789-826
[47]   Applying Under-Sampling Techniques and Cost-Sensitive Learning Methods on Risk Assessment of Breast Cancer [J].
Hsu, Jia-Lien ;
Hung, Ping-Cheng ;
Lin, Hung-Yen ;
Hsieh, Chung-Ho .
JOURNAL OF MEDICAL SYSTEMS, 2015, 39 (04)
[48]   Applying Under-Sampling Techniques and Cost-Sensitive Learning Methods on Risk Assessment of Breast Cancer [J].
Jia-Lien Hsu ;
Ping-Cheng Hung ;
Hung-Yen Lin ;
Chung-Ho Hsieh .
Journal of Medical Systems, 2015, 39
[49]   Cost-sensitive ensemble learning: a unifying framework [J].
Petrides, George ;
Verbeke, Wouter .
DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 36 (01) :1-28
[50]   Cost-sensitive ensemble learning: a unifying framework [J].
George Petrides ;
Wouter Verbeke .
Data Mining and Knowledge Discovery, 2022, 36 :1-28