Multi-label sampling based on local label imbalance

被引:35
作者
Liu, Bin [1 ]
Blekas, Konstantinos [2 ]
Tsoumakas, Grigorios [1 ]
机构
[1] Aristotle Univ Thessaloniki, Sch Informat, Thessaloniki 54124, Greece
[2] Univ Ioannina, Dept Comp Sci & Engn, Ioannina 45110, Greece
关键词
Multi-label learning; Class imbalance; Oversampling and undersampling; Local label imbalance; Ensemble methods; CLASSIFICATION;
D O I
10.1016/j.patcog.2021.108294
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance is an inherent characteristic of multi-label data that hinders most multi-label learning methods. One efficient and flexible strategy to deal with this problem is to employ sampling techniques before training a multi-label learning model. Although existing multi-label sampling approaches alleviate the global imbalance of multi-label datasets, it is actually the imbalance level within the local neighbour-hood of minority class examples that plays a key role in performance degradation. To address this issue, we propose a novel measure to assess the local label imbalance of multi-label datasets, as well as two multi-label sampling approaches, namely Multi-Label Synthetic Oversampling based on Local label imbal-ance (MLSOL) and Multi-Label Undersampling based on Local label imbalance (MLUL). By considering all informative labels, MLSOL creates more diverse and better labeled synthetic instances for difficult exam-ples, while MLUL eliminates instances that are harmful to their local region. Experimental results on 13 multi-label datasets demonstrate the effectiveness of the proposed measure and sampling approaches for a variety of evaluation metrics, particularly in the case of an ensemble of classifiers trained on repeated samples of the original data. (c) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 38 条
  • [1] Benavoli A, 2016, J MACH LEARN RES, V17
  • [2] Learning multi-label scene classification
    Boutell, MR
    Luo, JB
    Shen, XP
    Brown, CM
    [J]. PATTERN RECOGNITION, 2004, 37 (09) : 1757 - 1771
  • [3] Cost Sensitive Ranking Support Vector Machine for Multi-label Data Learning
    Cao, Peng
    Liu, Xiaoli
    Zhao, Dazhe
    Zaiane, Osmar
    [J]. PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016), 2017, 552 : 244 - 255
  • [4] REMEDIAL-HwR: Tackling multilabel imbalance through label decoupling and data resampling hybridization
    Charte, Francisco
    Rivera, Antonio J.
    del Jesus, Maria J.
    Herrera, Francisco
    [J]. NEUROCOMPUTING, 2019, 326 : 110 - 122
  • [5] Dealing with difficult minority labels in imbalanced mutilabel data sets
    Charte, Francisco
    Rivera, Antonio J.
    del Jesus, Maria J.
    Herrera, Francisco
    [J]. NEUROCOMPUTING, 2019, 326 : 39 - 53
  • [6] MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation
    Charte, Francisco
    Rivera, Antonio J.
    del Jesus, Maria J.
    Herrera, Francisco
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 89 : 385 - 397
  • [7] Addressing imbalance in multilabel classification: Measures and random resampling algorithms
    Charte, Francisco
    Rivera, Antonio J.
    del Jesus, Maria J.
    Herrera, Francisco
    [J]. NEUROCOMPUTING, 2015, 163 : 3 - 16
  • [8] Charte F, 2014, LECT NOTES COMPUT SC, V8669, P1, DOI 10.1007/978-3-319-10840-7_1
  • [9] Charte F, 2013, LECT NOTES COMPUT SC, V8073, P150, DOI 10.1007/978-3-642-40846-5_16
  • [10] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)