A novel progressively undersampling method based on the density peaks sequence for imbalanced data

被引：56

作者：

Xie, Xiaoying ^{[1
]}

Liu, Huawen ^{[2
]}

Zeng, Shouzhen ^{[3
]}

Lin, Lingbin ^{[4
]}

Li, Wen ^{[5
]}

机构：

[1] Zhejiang Normal Univ, Coll Econ & Management, Jinhua 321004, Zhejiang, Peoples R China

[2] Zhejiang Normal Univ, Coll Math & Comp Sci, Jinhua 321004, Zhejiang, Peoples R China

[3] Ningbo Univ, Sch Business, Ningbo 315211, Peoples R China

[4] Zhejiang Normal Univ, Student Management Off, Jinhua 321004, Zhejiang, Peoples R China

[5] Curtin Univ, Dept Math & Stat, Perth, WA 6845, Australia

来源：

KNOWLEDGE-BASED SYSTEMS | 2021年 / 213卷

关键词：

Progressive undersampling; Density peaks sequence; Importance degree; Optimal undersampling size; Imbalanced data;

D O I：

10.1016/j.knosys.2020.106689

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Undersampling is a widely used resampling technique for imbalanced data. As traditional undersampling techniques, typically making majority and minority classes in imbalanced data into the same scale, tend to miss valuable information, many strategies like clustering have been developed. However, two essential problems still remain and require more efforts to be put; that is, which and how many instances should be extracted in undersampling. To alleviate these two problems, in this paper we propose a novel undersampling method for imbalanced data. It exploits a sequence of density peaks to progressively extract instances from the majority classes of the imbalanced data. Specifically, two factors are introduced to measure the importance degree of each instance in the majority classes. With these two factors, we generate a sampling sequence based on the importance of instances for classification. Furthermore, the optimal undersampling size of the majority classes is automatically determined by progressively extracting the important instances from the sequence. To evaluate the effectiveness of the proposed method, a series of experiments comparing to six popular undersampling methods were conducted on 40 public benchmark datasets. The experimental results show that the performance of the proposed undersampling method is superior to the state-of-the-art undersampling methods. (C) 2020 Elsevier B.V. All rights reserved.

引用

页数：11

共 50 条

[1] Undersampling method based on minority class density for imbalanced data
Sun, Zhongqiang
Ying, Wenhao
Zhang, Wenjin
Gong, Shengrong
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
[2] A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
Cao, Jie
Shi, Yong
TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2021, 28 (06): : 1813 - 1819
[3] A Novel Selective Ensemble Algorithm for Imbalanced Data Classification Based on Exploratory Undersampling
Yin, Qing-Yan
Zhang, Jiang-She
Zhang, Chun-Xia
Ji, Nan-Nan
MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
[4] Nearest neighbors and density-based undersampling for imbalanced data classification with class overlap
Sun, Peiqi
Du, Yanhui
Xiong, Siyun
NEUROCOMPUTING, 2024, 609
[5] Local Density-Based Adaptive Undersampling Approach for Handling Imbalanced and Overlapped Data
Liu Yi
Huang Xian
Cao Zhen
Li Honglu
2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE, CCAI 2024, 2024, : 263 - 268
[6] Radial-Based Undersampling for imbalanced data classification
Koziarski, Michal
PATTERN RECOGNITION, 2020, 102
[7] A Membership Probability–Based Undersampling Algorithm for Imbalanced Data
Gilseung Ahn
You-Jin Park
Sun Hur
Journal of Classification, 2021, 38 : 2 - 15
[8] An Ensemble Learning Algorithm Based on Density Peaks Clustering and Fitness for Imbalanced Data
Xu, Hui
Liu, Qicheng
IEEE ACCESS, 2022, 10 : 116120 - 116128
[9] Overlap-Based Undersampling for Improving Imbalanced Data Classification
Vuttipittayamongkol, Pattaramon
Elyan, Eyad
Petrovski, Andrei
Jayne, Chrisina
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I, 2018, 11314 : 689 - 697
[10] A Membership Probability-Based Undersampling Algorithm for Imbalanced Data
Ahn, Gilseung
Park, You-Jin
Hur, Sun
JOURNAL OF CLASSIFICATION, 2021, 38 (01) : 2 - 15

← 1 2 3 4 5 →