Oversampling multi-label data based on natural neighbor and label correlation

被引:0
|
作者
Liu, Bin [1 ]
Zhou, Ao [1 ]
Wei, Bingkun [1 ]
Wang, Jin [1 ]
Tsoumakas, Grigorios [2 ]
机构
[1] Chongqing Univ Posts & Telecommun, Sch Comp Sci Technol, Key Lab Data Engn & Visual Comp, Chongqing, Peoples R China
[2] Aristotle Univ Thessaloniki, Sch Informat, Thessaloniki, Greece
关键词
Multi-label learning; Class imbalance; Oversampling; Natural neighbor; Label correlation; CLASSIFICATION; TRANSFORMER; IMBALANCE; SMOTE;
D O I
10.1016/j.eswa.2024.125257
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In multi-label learning, addressing the class imbalance issue is of paramount importance. Oversampling methods are preferred as they offer a more general solution independent of the model choice, i.e., they alleviate the imbalance of datasets by augmenting instances in the pre-processing step. Existing neighbor-based oversampling methods employ an empirical number of neighbors (k =5) to identify the local region for new instances creation. However, a single fixed k value cannot fit all labels, because every label usually has its own distinct distribution and complexity. Furthermore, the label assignment for synthetic instances usually depends on the statistics of individual labels within the corresponding neighborhood, ignoring the informative correlation among labels. To overcome these limitations, we propose an oversampling method called Multi- Label Oversampling with Natural neighbor and label Correlation (MLONC). Our approach offers three main advantages: (1) the adaptive number of neighbors for each label related to the data complexity is obtained via natural neighbor detection; (2) it encourages generating more instances proximate to the decision boundary of highly imbalanced labels, and diminishes the impact of outliers; (3) exploitation of label correlation in label assignment enhances the quality of the synthetic instances. Experimental results demonstrate the effectiveness of MLONC under various base classifiers.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Label correlation guided borderline oversampling for imbalanced multi-label data learning
    Zhang, Kai
    Mao, Zhaoyang
    Cao, Peng
    Liang, Wei
    Yang, Jinzhu
    Li, Weiping
    Zaiane, Osmar R.
    KNOWLEDGE-BASED SYSTEMS, 2023, 279
  • [2] Multi-label borderline oversampling technique
    Teng, Zeyu
    Cao, Peng
    Huang, Min
    Gao, Zheming
    Wang, Xingwei
    PATTERN RECOGNITION, 2024, 145
  • [3] Synthetic Oversampling of Multi-label Data Based on Local Label Distribution
    Liu, Bin
    Tsoumakas, Grigorios
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 11907 : 180 - 193
  • [4] Prototype selection for multi-label data based on label correlation
    Li, Haikun
    Fang, Min
    Li, Hang
    Wang, Peng
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (05) : 2121 - 2130
  • [5] Prototype selection for multi-label data based on label correlation
    Haikun Li
    Min Fang
    Hang Li
    Peng Wang
    Neural Computing and Applications, 2024, 36 : 2121 - 2130
  • [6] A diversity and reliability-enhanced synthetic minority oversampling technique for multi-label learning
    Gong, Yanlu
    Wu, Quanwang
    Zhou, Mengchu
    Chen, Chao
    INFORMATION SCIENCES, 2025, 690
  • [7] Multi-label classification with weak labels by learning label correlation and label regularization
    Ji, Xiaowan
    Tan, Anhui
    Wu, Wei-Zhi
    Gu, Shenming
    APPLIED INTELLIGENCE, 2023, 53 (17) : 20110 - 20133
  • [8] Reverse-nearest neighborhood based oversampling for imbalanced, multi-label datasets
    Sadhukhan, Payel
    Palit, Sarbani
    PATTERN RECOGNITION LETTERS, 2019, 125 : 813 - 820
  • [9] AEMLO: AutoEncoder-Guided Multi-label Oversampling
    Zhou, Ao
    Liu, Bin
    Wang, Jin
    Sun, Kaiwei
    Liu, Kelin
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT I, ECML PKDD 2024, 2024, 14941 : 107 - 124
  • [10] Multi-label feature selection based on correlation label enhancement
    He, Zhuoxin
    Lin, Yaojin
    Wang, Chenxi
    Guo, Lei
    Ding, Weiping
    INFORMATION SCIENCES, 2023, 647