LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

被引:0
作者
Jiang, Ting [1 ]
Wang, Deqing [1 ]
Sun, Leilei [1 ]
Yang, Huayi [1 ]
Zhao, Zhengyang [1 ]
Zhuang, Fuzhen [2 ,3 ]
机构
[1] Beihang Univ, SKLSDE&BDBC Lab, Beijing, Peoples R China
[2] Chinese Acad Sci, Key Lab Intelligent Informat Proc CAS, Inst Comp Technol, Beijing, Peoples R China
[3] Capital Normal Univ, Acad Multidisciplinary Studies, Beijing Adv Innovat Ctr Imaging Theory & Technol, Beijing, Peoples R China
来源
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2021年 / 35卷
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extreme Multi-label text Classification (XMC) is a task of finding the most relevant labels from a large label set. Nowadays deep learning-based methods have shown significant success in XMC. However, the existing methods (e.g., Attention-XML and X-Transformer etc) still suffer from 1) combining several models to train and predict for one dataset, and 2) sampling negative labels statically during the process of training label ranking model, which reduces both the efficiency and accuracy of the model. To address the above problems, we proposed LightXML, which adopts endto-end training and dynamic negative labels sampling. In LightXML, we use generative cooperative networks to recall and rank labels, in which label recalling part generates negative and positive labels, and label ranking part distinguishes positive labels from these labels. Through these networks, negative labels are sampled dynamically during label ranking part training by feeding with the same text representation. Extensive experiments show that LightXML outperforms state-of-the-art methods in five extreme multi-label datasets with much smaller model size and lower computational complexity. In particular, on the Amazon dataset with 670K labels, LightXML can reduce the model size up to 72% compared to AttentionXML. Our code is available at http://github.com/kongds/LightXML.
引用
收藏
页码:7987 / 7994
页数:8
相关论文
共 23 条
  • [1] [Anonymous], 2008, JOINT EUR C MACH LEA
  • [2] [Anonymous], 2013, P 7 ACM C REC SYST
  • [3] Data scarcity, robustness and extreme multi-label classification
    Babbar, Rohit
    Schoelkopf, Bernhard
    [J]. MACHINE LEARNING, 2019, 108 (8-9) : 1329 - 1351
  • [4] DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification
    Babbar, Rohit
    Schoelkopf, Bernhard
    [J]. WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, : 721 - 729
  • [5] Bhatia K, 2015, 29 ANN C NEURAL INFO, V28
  • [6] Taming Pretrained Transformers for Extreme Multi-label Text Classification
    Chang, Wei-Cheng
    Yu, Hsiang-Fu
    Zhong, Kai
    Yang, Yiming
    Dhillon, Inderjit S.
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3163 - 3171
  • [7] Dekel O., 2010, P MACHINE LEARNING R, P137
  • [8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [9] Izmailov P, 2018, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, P876
  • [10] Khandagale S., 2019, CoRR