DISCo: Distilled Student Models Co-training for Semi-supervised Text Mining

被引：0

作者：

Jiang, Weifeng ^{[1
,2
]}

Mao, Qianren ^{[2
]}

Lin, Chenghua ^{[3
]}

Li, Jianxin ^{[2
,4
]}

Deng, Ting ^{[4
]}

Yang, Weiyi ^{[4
]}

Wang, Zheng ^{[5
]}

机构：

[1] Nanyang Technol Univ, SCSE, Singapore, Singapore

[2] Zhongguancun Lab, Beijing, Peoples R China

[3] Univ Manchester, Dept Comp Sci, Manchester, Lancs, England

[4] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China

[5] Univ Leeds, Sch Comp, Leeds, W Yorkshire, England

来源：

2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many text mining models are constructed by fine-tuning a large deep pre-trained language model (PLM) in downstream tasks. However, a significant challenge nowadays is maintaining performance when we use a lightweight model with limited labelled samples. We present DisCo, a semi-supervised learning (SSL) framework for fine-tuning a cohort of small student models generated from a large PLM using knowledge distillation. Our key insight is to share complementary knowledge among distilled student cohorts to promote their SSL effectiveness. DisCo employs a novel co-training technique to optimize a cohort of multiple small student models by promoting knowledge sharing among students under diversified views: model views produced by different distillation strategies and data views produced by various input augmentations. We evaluate DisCo on both semi-supervised text classification and extractive summarization tasks. Experimental results show that DisCo can produce student models that are 7.6x smaller and 4.8x faster in inference than the baseline PLMs while maintaining comparable performance. We also show that DisCo-generated student models outperform the similar-sized models elaborately tuned in distinct tasks.

引用

页码：4015 / 4030

页数：16

共 50 条

[1] Semi-Supervised Regression with Co-Training
Zhou, Zhi-Hua
Li, Ming
19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 908 - 913
[2] Safe co-training for semi-supervised regression
Liu, Liyan
Huang, Peng
Yu, Hong
Min, Fan
INTELLIGENT DATA ANALYSIS, 2023, 27 (04) : 959 - 975
[3] Deep co-training for semi-supervised image segmentation
Peng, Jizong
Estrada, Guillermo
Pedersoli, Marco
Desrosiers, Christian
PATTERN RECOGNITION, 2020, 107 (107)
[4] Semi-Supervised Classification with Co-training for Deep Web
Fang Wei
Cui Zhiming
ADVANCED MEASUREMENT AND TEST, PARTS 1 AND 2, 2010, 439-440 : 183 - +
[5] Spatial co-training for semi-supervised image classification
Hong, Yi
Zhu, Weiping
PATTERN RECOGNITION LETTERS, 2015, 63 : 59 - 65
[6] Semi-supervised Learning for Regression with Co-training by Committee
Hady, Mohamed Farouk Abdel
Schwenker, Friedhelm
Palm, Guenther
ARTIFICIAL NEURAL NETWORKS - ICANN 2009, PT I, 2009, 5768 : 121 - 130
[7] Deep Co-Training for Semi-Supervised Image Recognition
Qiao, Siyuan
Shen, Wei
Zhang, Zhishuai
Wang, Bo
Yuille, Alan
COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 142 - 159
[8] Co-Training based Semi-Supervised Web Spam Detection
Wang, Wei
Lee, Xiao-Dong
Hu, An-Lei
Geng, Guang-Gang
2013 10TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2013, : 789 - 793
[9] Semi-supervised learning combining co-training with active learning
Zhang, Yihao
Wen, Junhao
Wang, Xibin
Jiang, Zhuo
EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (05) : 2372 - 2378
[10] CO-ADAPTATION: ADAPTIVE CO-TRAINING FOR SEMI-SUPERVISED LEARNING
Tur, Gokhan
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3721 - 3724

← 1 2 3 4 5 →