Taming Pretrained Transformers for Extreme Multi-label Text Classification

被引:143
作者
Chang, Wei-Cheng [1 ]
Yu, Hsiang-Fu [2 ]
Zhong, Kai [2 ]
Yang, Yiming [1 ]
Dhillon, Inderjit S. [2 ,3 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Amazon, Bellevue, WA USA
[3] UT Austin, Austin, TX USA
来源
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2020年
关键词
Transformer models; eXtreme Multi-label text classification;
D O I
10.1145/3394486.3403368
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the extreme multi-label text classification (XMC) problem: given an input text, return the most relevant labels from a large label collection. For example, the input text could be a product description on Amazon.com and the labels could be product categories. XMC is an important yet challenging problem in the NLP community. Recently, deep pretrained transformer models have achieved state-of-the-art performance on many NLP tasks including sentence classification, albeit with small label sets. However, naively applying deep transformer models to the XMC problem leads to sub-optimal performance due to the large output space and the label sparsity issue. In this paper, we propose X-Transformer, the first scalable approach to fine-tuning deep transformer models for the XMC problem. The proposed method achieves new state-of-the-art results on four XMC benchmark datasets. In particular, on a Wiki dataset with around 0.5 million labels, the prec@1 of X-Transformer is 77.28%, a substantial improvement over state-of-the-art XMC approaches Parabel (linear) and AttentionXML (neural), which achieve 68.70% and 76.95% precision@1, respectively. We further apply X-Transformer to a product2query dataset from Amazon and gained 10.7% relative improvement on prec@1 over Parabel.
引用
收藏
页码:3163 / 3171
页数:9
相关论文
共 32 条
[1]  
[Anonymous], 2019, CoRR
[2]   Data scarcity, robustness and extreme multi-label classification [J].
Babbar, Rohit ;
Schoelkopf, Bernhard .
MACHINE LEARNING, 2019, 108 (8-9) :1329-1351
[3]   DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification [J].
Babbar, Rohit ;
Schoelkopf, Bernhard .
WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, :721-729
[4]  
Bhatia Kush, 2015, Advances in neural information processing systems, V28
[5]   Future provision of home end-of-life care: Family carers' willingness for caregiving and needs for support [J].
Chan, Wallace Chi Ho .
PALLIATIVE & SUPPORTIVE CARE, 2021, 19 (05) :580-586
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]  
Guo C., 2019, ADV NEURAL INFORM PR, P4944
[8]   Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches [J].
Jain, Himanshu ;
Balasubramanian, Venkatesh ;
Chunduri, Bhanu ;
Varma, Manik .
PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, :528-536
[9]   Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications [J].
Jain, Himanshu ;
Prabhu, Yashoteja ;
Varma, Manik .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :935-944
[10]  
Khandagale Sujay, 2019, ARXIV190408249