Resampling Multilabel Datasets by Decoupling Highly Imbalanced Labels

被引:11
作者
Charte, Francisco [1 ]
Rivera, Antonio [2 ]
Jose del Jesus, Maria [2 ]
Herrera, Francisco [1 ]
机构
[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, Granada, Spain
[2] Univ Jaen, Dept Comp Sci, Jaen, Spain
来源
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS (HAIS 2015) | 2015年 / 9121卷
关键词
Multilabel classification; Imbalanced learning; Resampling; Label concurrence; NEURAL-NETWORKS; CLASSIFICATION;
D O I
10.1007/978-3-319-19644-2_41
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multilabel classification is a task that has been broadly studied in late years. However, how to face learning from imbalanced multilabel datasets (MLDs) has only been addressed latterly. In this regard, a few proposals can be found in the literature, most of them based on resampling techniques adapted from the traditional classification field. The success of these methods varies extraordinarily depending on the traits of the chosen MLDs. One of the characteristics which significantly influences the behavior of multilabel resampling algorithms is the joint appearance of minority and majority labels in the same instances. It was demonstrated that MLDs with a high level of concurrence among imbalanced labels could hardly benefit from resampling methods. This paper proposes an original resampling algorithm, called REMEDIAL, which is not based on removing majority instances nor creating minority ones, but on a procedure to decouple highly imbalanced labels. As will be experimentally demonstrated, this is an interesting approach for certain MLDs.
引用
收藏
页码:489 / 501
页数:13
相关论文
共 33 条
[1]   Learning multi-label scene classification [J].
Boutell, MR ;
Luo, JB ;
Shen, XP ;
Brown, CM .
PATTERN RECOGNITION, 2004, 37 (09) :1757-1771
[2]  
Charte F., WORK MULTILABEL DATA, DOI [10.6084/m9.figshare.1356035, DOI 10.6084/M9.FIGSHARE.1356035]
[3]  
Charte F., NEUROCOMPUT IN PRESS
[4]  
Charte F, 2014, LECT NOTES COMPUT SC, V8480, P110
[5]  
Charte F, 2014, LECT NOTES COMPUT SC, V8669, P1, DOI 10.1007/978-3-319-10840-7_1
[6]  
Charte F, 2013, LECT NOTES COMPUT SC, V8073, P150, DOI 10.1007/978-3-642-40846-5_16
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]  
Chawla NV, 2004, SIGKDD Explor. Newsl., V6, P1
[9]   Combining instance-based learning and logistic regression for multilabel classification [J].
Cheng, Weiwei ;
Huellermeier, Eyke .
MACHINE LEARNING, 2009, 76 (2-3) :211-225
[10]  
Clare A., 2001, Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, P42, DOI DOI 10.1007/3-540-44794-6_4