Residual diverse ensemble for long-tailed multi-label text classification

被引:2
作者
Shi, Jiangxin [1 ,2 ]
Wei, Tong [3 ,4 ]
Li, Yufeng [1 ,2 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Nanjing Univ, Sch Artificial Intelligence, Nanjing 210023, Peoples R China
[3] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Peoples R China
[4] Southeast Univ, Key Lab Comp Network & Informat Integrat, Minist Educ, Nanjing 210096, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
multi-label learning; extreme multi-label learning; long-tailed distribution; multi-label text classification; ensemble learning;
D O I
10.1007/s11432-022-3915-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Long-tailed multi-label text classification aims to identify a subset of relevant labels from a large candidate label set, where the training datasets usually follow long-tailed label distributions. Many of the previous studies have treated head and tail labels equally, resulting in unsatisfactory performance for identifying tail labels. To address this issue, this paper proposes a novel learning method that combines arbitrary models with two steps. The first step is the "diverse ensemble" that encourages diverse predictions among multiple shallow classifiers, particularly on tail labels, and can improve the generalization of tail labels. The second is the "error correction" that takes advantage of accurate predictions on head labels by the base model and approximates its residual errors for tail labels. Thus, it enables the "diverse ensemble" to focus on optimizing the tail label performance. This overall procedure is called residual diverse ensemble (RDE). RDE is implemented via a single-hidden-layer perceptron and can be used for scaling up to hundreds of thousands of labels. We empirically show that RDE consistently improves many existing models with considerable performance gains on benchmark datasets, especially with respect to the propensity-scored evaluation metrics. Moreover, RDE converges in less than 30 training epochs without increasing the computational overhead.
引用
收藏
页数:14
相关论文
共 60 条
[51]  
Yeh CK, 2017, AAAI CONF ARTIF INTE, P2838
[52]   PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification [J].
Yen, Ian E. H. ;
Huang, Xiangru ;
Dai, Wei ;
Ravikumar, Pradeep ;
Dhillon, Inderjit ;
Xing, Eric .
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, :545-553
[53]  
You R, 2019, ARXIV
[54]  
Yu HF, 2014, PR MACH LEARN RES, V32
[55]   ML-KNN: A lazy learning approach to multi-label leaming [J].
Zhang, Min-Ling ;
Zhou, Zhi-Hua .
PATTERN RECOGNITION, 2007, 40 (07) :2038-2048
[56]   A Review on Multi-Label Learning Algorithms [J].
Zhang, Min-Ling ;
Zhou, Zhi-Hua .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (08) :1819-1837
[57]   Deep Extreme Multi-label Learning [J].
Zhang, Wenjie ;
Yan, Junchi ;
Wang, Xiangfeng ;
Zha, Hongyuan .
ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, :100-107
[58]   Technology-driven mandatory customer participation: a new recovery strategy to promote customers' online post-recovery satisfaction [J].
Zhang, Yu ;
Luo, Xin ;
Shao, Bingjia ;
Benitez, Jose .
EUROPEAN JOURNAL OF INFORMATION SYSTEMS, 2024, 33 (03) :315-333
[59]   Combat data shift in few-shot learning with knowledge graph [J].
Zhu, Yongchun ;
Zhuang, Fuzhen ;
Zhang, Xiangliang ;
Qi, Zhiyuan ;
Shi, Zhiping ;
Cao, Juan ;
He, Qing .
FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (01)
[60]  
Zubiaga A, 2012, ARXIV