DISCo: Distilled Student Models Co-training for Semi-supervised Text Mining

被引:0
|
作者
Jiang, Weifeng [1 ,2 ]
Mao, Qianren [2 ]
Lin, Chenghua [3 ]
Li, Jianxin [2 ,4 ]
Deng, Ting [4 ]
Yang, Weiyi [4 ]
Wang, Zheng [5 ]
机构
[1] Nanyang Technol Univ, SCSE, Singapore, Singapore
[2] Zhongguancun Lab, Beijing, Peoples R China
[3] Univ Manchester, Dept Comp Sci, Manchester, Lancs, England
[4] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
[5] Univ Leeds, Sch Comp, Leeds, W Yorkshire, England
来源
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023 | 2023年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many text mining models are constructed by fine-tuning a large deep pre-trained language model (PLM) in downstream tasks. However, a significant challenge nowadays is maintaining performance when we use a lightweight model with limited labelled samples. We present DisCo, a semi-supervised learning (SSL) framework for fine-tuning a cohort of small student models generated from a large PLM using knowledge distillation. Our key insight is to share complementary knowledge among distilled student cohorts to promote their SSL effectiveness. DisCo employs a novel co-training technique to optimize a cohort of multiple small student models by promoting knowledge sharing among students under diversified views: model views produced by different distillation strategies and data views produced by various input augmentations. We evaluate DisCo on both semi-supervised text classification and extractive summarization tasks. Experimental results show that DisCo can produce student models that are 7.6x smaller and 4.8x faster in inference than the baseline PLMs while maintaining comparable performance. We also show that DisCo-generated student models outperform the similar-sized models elaborately tuned in distinct tasks.
引用
收藏
页码:4015 / 4030
页数:16
相关论文
共 50 条
  • [1] Semi-Supervised Regression with Co-Training
    Zhou, Zhi-Hua
    Li, Ming
    19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 908 - 913
  • [2] Safe co-training for semi-supervised regression
    Liu, Liyan
    Huang, Peng
    Yu, Hong
    Min, Fan
    INTELLIGENT DATA ANALYSIS, 2023, 27 (04) : 959 - 975
  • [3] Deep co-training for semi-supervised image segmentation
    Peng, Jizong
    Estrada, Guillermo
    Pedersoli, Marco
    Desrosiers, Christian
    PATTERN RECOGNITION, 2020, 107 (107)
  • [4] Semi-Supervised Classification with Co-training for Deep Web
    Fang Wei
    Cui Zhiming
    ADVANCED MEASUREMENT AND TEST, PARTS 1 AND 2, 2010, 439-440 : 183 - +
  • [5] Spatial co-training for semi-supervised image classification
    Hong, Yi
    Zhu, Weiping
    PATTERN RECOGNITION LETTERS, 2015, 63 : 59 - 65
  • [6] Semi-supervised Learning for Regression with Co-training by Committee
    Hady, Mohamed Farouk Abdel
    Schwenker, Friedhelm
    Palm, Guenther
    ARTIFICIAL NEURAL NETWORKS - ICANN 2009, PT I, 2009, 5768 : 121 - 130
  • [7] Deep Co-Training for Semi-Supervised Image Recognition
    Qiao, Siyuan
    Shen, Wei
    Zhang, Zhishuai
    Wang, Bo
    Yuille, Alan
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 142 - 159
  • [8] Co-Training based Semi-Supervised Web Spam Detection
    Wang, Wei
    Lee, Xiao-Dong
    Hu, An-Lei
    Geng, Guang-Gang
    2013 10TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2013, : 789 - 793
  • [9] Semi-supervised learning combining co-training with active learning
    Zhang, Yihao
    Wen, Junhao
    Wang, Xibin
    Jiang, Zhuo
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (05) : 2372 - 2378
  • [10] CO-ADAPTATION: ADAPTIVE CO-TRAINING FOR SEMI-SUPERVISED LEARNING
    Tur, Gokhan
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3721 - 3724