Taming Pretrained Transformers for Extreme Multi-label Text Classification

被引：143

作者：

Chang, Wei-Cheng ^{[1
]}

Yu, Hsiang-Fu ^{[2
]}

Zhong, Kai ^{[2
]}

Yang, Yiming ^{[1
]}

Dhillon, Inderjit S. ^{[2
,3
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[2] Amazon, Bellevue, WA USA

[3] UT Austin, Austin, TX USA

来源：

KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2020年

关键词：

Transformer models; eXtreme Multi-label text classification;

D O I：

10.1145/3394486.3403368

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the extreme multi-label text classification (XMC) problem: given an input text, return the most relevant labels from a large label collection. For example, the input text could be a product description on Amazon.com and the labels could be product categories. XMC is an important yet challenging problem in the NLP community. Recently, deep pretrained transformer models have achieved state-of-the-art performance on many NLP tasks including sentence classification, albeit with small label sets. However, naively applying deep transformer models to the XMC problem leads to sub-optimal performance due to the large output space and the label sparsity issue. In this paper, we propose X-Transformer, the first scalable approach to fine-tuning deep transformer models for the XMC problem. The proposed method achieves new state-of-the-art results on four XMC benchmark datasets. In particular, on a Wiki dataset with around 0.5 million labels, the prec@1 of X-Transformer is 77.28%, a substantial improvement over state-of-the-art XMC approaches Parabel (linear) and AttentionXML (neural), which achieve 68.70% and 76.95% precision@1, respectively. We further apply X-Transformer to a product2query dataset from Amazon and gained 10.7% relative improvement on prec@1 over Parabel.

引用

页码：3163 / 3171

页数：9

共 32 条

[1]

[Anonymous], 2019, CoRR

[2] Data scarcity, robustness and extreme multi-label classification [J].

Babbar, Rohit ;

Schoelkopf, Bernhard .

MACHINE LEARNING, 2019, 108 (8-9) :1329-1351

[3] DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification [J].

Babbar, Rohit ;

Schoelkopf, Bernhard .

WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, :721-729

[4]

Bhatia Kush, 2015, Advances in neural information processing systems, V28

[5] Future provision of home end-of-life care: Family carers' willingness for caregiving and needs for support [J].

Chan, Wallace Chi Ho .

PALLIATIVE & SUPPORTIVE CARE, 2021, 19 (05) :580-586

[6]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[7]

Guo C., 2019, ADV NEURAL INFORM PR, P4944

[8] Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches [J].

Jain, Himanshu ;

Balasubramanian, Venkatesh ;

Chunduri, Bhanu ;

Varma, Manik .

PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, :528-536

[9] Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications [J].

Jain, Himanshu ;

Prabhu, Yashoteja ;

Varma, Manik .

KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :935-944

[10]

Khandagale Sujay, 2019, ARXIV190408249

← 1 2 3 4 →