Combating Mutuality with Difficulty Factors in Multi-class Imbalanced Data: A Similarity-based Hybrid Sampling

被引:2
作者
Zheng, Zhong [1 ]
Yan, Yuanting [1 ]
Zhang, Yiwen [1 ]
Zhang, Yanping [1 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Hefei, Peoples R China
来源
2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA) | 2022年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
multi-class; imbalanced data; sampling; classification; sample similarity; data difficulty factors; ROC CURVE; CLASSIFICATION; SMOTE; AREA;
D O I
10.1109/DSAA54385.2022.10032369
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-class imbalanced problem widely exists in real-life applications and has been a challenging issue. Existing sampling methods including decomposition approaches and dedicated approaches have limitations in handling the complex mutual relationships along with data difficulty factors. Actually, the relative minorities are critical in mutual relationship, and the data difficulty factors are harmful for these minority classes. In this paper, we propose SHSampler, a similarity-based hybrid sampling to combat the mutuality by addressing data difficulty factors in multi-class imbalanced data. Specifically, SHSampler firstly utilizes a sample similarity and dissimilarity estimation to identify data difficulty factors. Then, SHSampler conducts a relative majority weakening undersampling and a relative minority strengthening oversampling to reduce the negative impact of data difficulty factors and highlight the importance of the minorities. Extensive experiments over 20 typical datasets demonstrate the superiority of SHSampler in terms of MAUC and mGM when compared with 6 state-of-the-art methods.
引用
收藏
页码:387 / 396
页数:10
相关论文
共 39 条
[1]   To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques [J].
Abdi, Lida ;
Hashemi, Sattar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) :238-251
[2]  
Agrawal A, 2015, 2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), P226
[3]   Classifying Imbalanced Multi-modal Sensor Data for Human Activity Recognition in a Smart Home using Deep Learning [J].
Alani, Ali A. ;
Cosma, Georgina ;
Taherkhani, Aboozar .
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[4]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[5]  
[Anonymous], 2006, ACM Sigkdd Explor. Newsl., DOI DOI 10.1145/1147234.1147236
[6]  
[Anonymous], 2004, ACM Sigkdd Explorations Newsletter
[7]   ON MULTI-CLASS COST-SENSITIVE LEARNING [J].
Zhou, Zhi-Hua ;
Liu, Xu-Ying .
COMPUTATIONAL INTELLIGENCE, 2010, 26 (03) :232-257
[8]  
Benavoli A, 2017, J MACH LEARN RES, V18
[9]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[10]   Relevance-Based Evaluation Metrics for Multi-class Imbalanced Domains [J].
Branco, Paula ;
Torgo, Luis ;
Ribeiro, Rita P. .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT I, 2017, 10234 :698-710