Residual diverse ensemble for long-tailed multi-label text classification

被引：2

作者：

Shi, Jiangxin ^{[1
,2
]}

Wei, Tong ^{[3
,4
]}

Li, Yufeng ^{[1
,2
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China

[2] Nanjing Univ, Sch Artificial Intelligence, Nanjing 210023, Peoples R China

[3] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Peoples R China

[4] Southeast Univ, Key Lab Comp Network & Informat Integrat, Minist Educ, Nanjing 210096, Peoples R China

来源：

SCIENCE CHINA-INFORMATION SCIENCES | 2024年 / 67卷 / 11期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

multi-label learning; extreme multi-label learning; long-tailed distribution; multi-label text classification; ensemble learning;

D O I：

10.1007/s11432-022-3915-6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Long-tailed multi-label text classification aims to identify a subset of relevant labels from a large candidate label set, where the training datasets usually follow long-tailed label distributions. Many of the previous studies have treated head and tail labels equally, resulting in unsatisfactory performance for identifying tail labels. To address this issue, this paper proposes a novel learning method that combines arbitrary models with two steps. The first step is the "diverse ensemble" that encourages diverse predictions among multiple shallow classifiers, particularly on tail labels, and can improve the generalization of tail labels. The second is the "error correction" that takes advantage of accurate predictions on head labels by the base model and approximates its residual errors for tail labels. Thus, it enables the "diverse ensemble" to focus on optimizing the tail label performance. This overall procedure is called residual diverse ensemble (RDE). RDE is implemented via a single-hidden-layer perceptron and can be used for scaling up to hundreds of thousands of labels. We empirically show that RDE consistently improves many existing models with considerable performance gains on benchmark datasets, especially with respect to the propensity-scored evaluation metrics. Moreover, RDE converges in less than 30 training epochs without increasing the computational overhead.

引用

页数：14

共 60 条

[1] Data scarcity, robustness and extreme multi-label classification [J].

Babbar, Rohit ;

Schoelkopf, Bernhard .

MACHINE LEARNING, 2019, 108 (8-9) :1329-1351

[2] DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification [J].

Babbar, Rohit ;

Schoelkopf, Bernhard .

WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, :721-729

[3]

Bhatia Kush, 2015, Advances in Neural Information Processing Systems, V28

[4]

Bi W, 2013, P INT C MACHINE LEAR, P405

[5] Taming Pretrained Transformers for Extreme Multi-label Text Classification [J].

Chang, Wei-Cheng ;

Yu, Hsiang-Fu ;

Zhong, Kai ;

Yang, Yiming ;

Dhillon, Inderjit S. .

KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, :3163-3171

[6] ResLT: Residual Learning for Long-Tailed Recognition [J].

Cui, Jiequan ;

Liu, Shu ;

Tian, Zhuotao ;

Zhong, Zhisheng ;

Jia, Jiaya .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) :3695-3706

[7] Class-Balanced Loss Based on Effective Number of Samples [J].

Cui, Yin ;

Jia, Menglin ;

Lin, Tsung-Yi ;

Song, Yang ;

Belongie, Serge .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9260-9269

[8]

Daume III H, 2016, ARXIV

[9]

Evron I., 2018, P ADV NEURAL INFORM, P7233

[10]

Fang H, 2019, Data Min, P280

← 1 2 3 4 5 6 →