DISCo: Distilled Student Models Co-training for Semi-supervised Text Mining

被引：0

作者：

Jiang, Weifeng ^{[1
,2
]}

Mao, Qianren ^{[2
]}

Lin, Chenghua ^{[3
]}

Li, Jianxin ^{[2
,4
]}

Deng, Ting ^{[4
]}

Yang, Weiyi ^{[4
]}

Wang, Zheng ^{[5
]}

机构：

[1] Nanyang Technol Univ, SCSE, Singapore, Singapore

[2] Zhongguancun Lab, Beijing, Peoples R China

[3] Univ Manchester, Dept Comp Sci, Manchester, Lancs, England

[4] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China

[5] Univ Leeds, Sch Comp, Leeds, W Yorkshire, England

来源：

2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many text mining models are constructed by fine-tuning a large deep pre-trained language model (PLM) in downstream tasks. However, a significant challenge nowadays is maintaining performance when we use a lightweight model with limited labelled samples. We present DisCo, a semi-supervised learning (SSL) framework for fine-tuning a cohort of small student models generated from a large PLM using knowledge distillation. Our key insight is to share complementary knowledge among distilled student cohorts to promote their SSL effectiveness. DisCo employs a novel co-training technique to optimize a cohort of multiple small student models by promoting knowledge sharing among students under diversified views: model views produced by different distillation strategies and data views produced by various input augmentations. We evaluate DisCo on both semi-supervised text classification and extractive summarization tasks. Experimental results show that DisCo can produce student models that are 7.6x smaller and 4.8x faster in inference than the baseline PLMs while maintaining comparable performance. We also show that DisCo-generated student models outperform the similar-sized models elaborately tuned in distinct tasks.

引用

页码：4015 / 4030

页数：16

共 50 条

[41] Root-Cause Analysis with Semi-Supervised Co-Training for Integrated Systems
Pan, Renjian
Li, Xin
Chakrabarty, Krishnendu
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (03)
[42] Co-Training Semi-Supervised Active Learning Algorithm based on Noise Filter
Chen Ya-bi
Zhan Yong-zhao
PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL III, 2009, : 524 - 528
[43] Three-Way Co-Training with Pseudo Labels for Semi-Supervised Learning
Wang, Liuxin
Gao, Can
Zhou, Jie
Wen, Jiajun
MATHEMATICS, 2023, 11 (15)
[44] Multi-Label Learning with Co-Training Based on Semi-Supervised Regression
Xu, Meixiang
Sun, Fuming
Jiang, Xiaojun
2014 INTERNATIONAL CONFERENCE ON SECURITY, PATTERN ANALYSIS, AND CYBERNETICS (SPAC), 2014, : 175 - 180
[45] Temporal-Frequency Co-training for Time Series Semi-supervised Learning
Liu, Zhen
Ma, Qianli
Ma, Peitian
Wang, Linghao
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8923 - 8931
[46] Fine-Tuning Language Models For Semi-Supervised Text Mining
Chen, Xinyu
Beaver, Ian
Freeman, Cynthia
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 3608 - 3617
[47] An Efficient Approach to Select Instances in Self-Training and Co-Training Semi-Supervised Methods
Ovidio Vale, Karliane Medeiros
Gorgonio, Arthur Costa
Gorgonio, Flavius Da Luz E.
De Paula Canuto, Anne Magaly
IEEE ACCESS, 2022, 10 : 7254 - 7276
[48] Learning Adaptive Semi-Supervised Multi-Output Soft-Sensors With Co-Training of Heterogeneous Models
Li, Dong
Huang, Daoping
Yu, Guangping
Liu, Yiqi
IEEE ACCESS, 2020, 8 : 46493 - 46504
[49] Co-Training Semi-Supervised Deep Learning for Sentiment Classification of MOOC Forum Posts
Chen, Jing
Feng, Jun
Sun, Xia
Liu, Yang
SYMMETRY-BASEL, 2020, 12 (01):
[50] When less is more: on the value of “co-training” for semi-supervised software defect predictors
Suvodeep Majumder
Joymallya Chakraborty
Tim Menzies
Empirical Software Engineering, 2024, 29

← 1 2 3 4 5 →