An Exploration of Semi-supervised Text Classification

被引:0
|
作者
Lien, Henrik [2 ]
Biermann, Daniel [2 ]
Palumbo, Fabrizio [1 ]
Goodwin, Morten [1 ,2 ]
机构
[1] Oslo Metropolitan Univ, Inst Informasjonsteknol, Artificial Intelligence Lab, Oslo, Norway
[2] Univ Agder, Dept ICT, Ctr Artificial Intelligence Res, Grimstad, Norway
关键词
Machine learning; Text classification; Semi-supervised learning;
D O I
10.1007/978-3-031-08223-8_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Good performance in supervised text classification is usually obtained with the use of large amounts of labeled training data. However, obtaining labeled data is often expensive and time-consuming. To overcome these limitations, researchers have developed Semi-Supervised learning (SSL) algorithms exploiting the use of unlabeled data, which are generally easy and free to access. With SSL, unlabeled and labeled data are combined to outperform Supervised-Learning algorithms. However, setting up SSL neural networks for text classification is cumbersome and frequently based on a trial and error process. We show that the hyperparameter configuration significantly impacts SSL performance, and the learning rate is the most influential parameter. Additionally, increasing model size also improves SSL performance, particularly when less pre-processing data are available. Interestingly, as opposed to feed-forward models, recurrent models generally reach a performance threshold as pre-processing data size increases. This article expands the knowledge on hyperparameters and model size in relation to SSL application in text classification. This work supports the use of SSL work in future NLP projects by optimizing model design and potentially lowering training time, particularly if time-restricted.
引用
收藏
页码:477 / 488
页数:12
相关论文
共 50 条
  • [1] Semi-supervised collaborative text classification
    Jin, Rong
    Wu, Ming
    Sukthankar, Rahul
    MACHINE LEARNING: ECML 2007, PROCEEDINGS, 2007, 4701 : 600 - +
  • [2] A review of semi-supervised learning for text classification
    José Marcio Duarte
    Lilian Berton
    Artificial Intelligence Review, 2023, 56 : 9401 - 9469
  • [3] Text Classification Using Semi-Supervised Clustering
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    2009 INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING, PROCEEDINGS, 2009, : 197 - 200
  • [4] Variational Autoencoder for Semi-Supervised Text Classification
    Xu, Weidi
    Sun, Haoze
    Deng, Chao
    Tan, Ying
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3358 - 3364
  • [5] A review of semi-supervised learning for text classification
    Duarte, Jose Marcio
    Berton, Lilian
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (09) : 9401 - 9469
  • [6] Semi-Supervised Text Classification With Universum Learning
    Liu, Chien-Liang
    Hsaio, Wen-Hoar
    Lee, Chia-Hoang
    Chang, Tao-Hsing
    Kuo, Tsung-Hsun
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (02) : 462 - 473
  • [7] Variational Pretraining for Semi-supervised Text Classification
    Gururangan, Suchin
    Dang, Tam
    Card, Dallas
    Smith, Noah A.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5880 - 5894
  • [8] TEXT CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING
    Vo Duy Thanh
    Vo Trung Hung
    Pham Minh Tuan
    Doan Van Ban
    2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 232 - 236
  • [9] Text classification with enhanced semi-supervised fuzzy clustering
    Keswani, G
    Hall, LO
    PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOL 1 & 2, 2002, : 621 - 626
  • [10] Different Similarity Measures in Semi-supervised Text Classification
    Wajeed, Mohammed Abdul
    Adilakshmi, T.
    2011 ANNUAL IEEE INDIA CONFERENCE (INDICON-2011): ENGINEERING SUSTAINABLE SOLUTIONS, 2011,