Residual diverse ensemble for long-tailed multi-label text classification

被引:2
作者
Shi, Jiangxin [1 ,2 ]
Wei, Tong [3 ,4 ]
Li, Yufeng [1 ,2 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Nanjing Univ, Sch Artificial Intelligence, Nanjing 210023, Peoples R China
[3] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Peoples R China
[4] Southeast Univ, Key Lab Comp Network & Informat Integrat, Minist Educ, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
multi-label learning; extreme multi-label learning; long-tailed distribution; multi-label text classification; ensemble learning;
D O I
10.1007/s11432-022-3915-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Long-tailed multi-label text classification aims to identify a subset of relevant labels from a large candidate label set, where the training datasets usually follow long-tailed label distributions. Many of the previous studies have treated head and tail labels equally, resulting in unsatisfactory performance for identifying tail labels. To address this issue, this paper proposes a novel learning method that combines arbitrary models with two steps. The first step is the "diverse ensemble" that encourages diverse predictions among multiple shallow classifiers, particularly on tail labels, and can improve the generalization of tail labels. The second is the "error correction" that takes advantage of accurate predictions on head labels by the base model and approximates its residual errors for tail labels. Thus, it enables the "diverse ensemble" to focus on optimizing the tail label performance. This overall procedure is called residual diverse ensemble (RDE). RDE is implemented via a single-hidden-layer perceptron and can be used for scaling up to hundreds of thousands of labels. We empirically show that RDE consistently improves many existing models with considerable performance gains on benchmark datasets, especially with respect to the propensity-scored evaluation metrics. Moreover, RDE converges in less than 30 training epochs without increasing the computational overhead.
引用
收藏
页数:14
相关论文
共 59 条
  • [1] Data scarcity, robustness and extreme multi-label classification
    Babbar, Rohit
    Schoelkopf, Bernhard
    [J]. MACHINE LEARNING, 2019, 108 (8-9) : 1329 - 1351
  • [2] DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification
    Babbar, Rohit
    Schoelkopf, Bernhard
    [J]. WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, : 721 - 729
  • [3] Bhatia Kush, 2015, ADV NEURAL INFORM PR, V28
  • [4] Bi W., 2013, P 30 INT C MACH LEAR, V28, P405
  • [5] Cao KD, 2019, ADV NEUR IN, V32
  • [6] Taming Pretrained Transformers for Extreme Multi-label Text Classification
    Chang, Wei-Cheng
    Yu, Hsiang-Fu
    Zhong, Kai
    Yang, Yiming
    Dhillon, Inderjit S.
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3163 - 3171
  • [7] ResLT: Residual Learning for Long-Tailed Recognition
    Cui, Jiequan
    Liu, Shu
    Tian, Zhuotao
    Zhong, Zhisheng
    Jia, Jiaya
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3695 - 3706
  • [8] Class-Balanced Loss Based on Effective Number of Samples
    Cui, Yin
    Jia, Menglin
    Lin, Tsung-Yi
    Song, Yang
    Belongie, Serge
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9260 - 9269
  • [9] Iii HD, 2016, Arxiv, DOI arXiv:1606.04988
  • [10] Evron Itay, 2018, Proceedings of the Advances in Neural Information Processing Systems, P7233