An Exploration of Semi-supervised Text Classification

被引：0

作者：

Lien, Henrik ^{[2
]}

Biermann, Daniel ^{[2
]}

Palumbo, Fabrizio ^{[1
]}

Goodwin, Morten ^{[1
,2
]}

机构：

[1] Oslo Metropolitan Univ, Inst Informasjonsteknol, Artificial Intelligence Lab, Oslo, Norway

[2] Univ Agder, Dept ICT, Ctr Artificial Intelligence Res, Grimstad, Norway

来源：

ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2022 | 2022年 / 1600卷

关键词：

Machine learning; Text classification; Semi-supervised learning;

D O I：

10.1007/978-3-031-08223-8_39

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Good performance in supervised text classification is usually obtained with the use of large amounts of labeled training data. However, obtaining labeled data is often expensive and time-consuming. To overcome these limitations, researchers have developed Semi-Supervised learning (SSL) algorithms exploiting the use of unlabeled data, which are generally easy and free to access. With SSL, unlabeled and labeled data are combined to outperform Supervised-Learning algorithms. However, setting up SSL neural networks for text classification is cumbersome and frequently based on a trial and error process. We show that the hyperparameter configuration significantly impacts SSL performance, and the learning rate is the most influential parameter. Additionally, increasing model size also improves SSL performance, particularly when less pre-processing data are available. Interestingly, as opposed to feed-forward models, recurrent models generally reach a performance threshold as pre-processing data size increases. This article expands the knowledge on hyperparameters and model size in relation to SSL application in text classification. This work supports the use of SSL work in future NLP projects by optimizing model design and potentially lowering training time, particularly if time-restricted.

引用

页码：477 / 488

页数：12

共 22 条

[1] aladdinpersson, 2021, SEQ2SEQ ATT
[2] Baevski A, 2019, Arxiv, DOI arXiv:1903.07785
[3] On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
Bender, Emily M.
Gebru, Timnit
McMillan-Major, Angelina
Shmitchell, Shmargaret
[J]. PROCEEDINGS OF THE 2021 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2021, 2021, : 610 - 623
[4] Bishop C. M., 2006, PATTERN RECOGN
[5] Casanueva I, 2020, Arxiv, DOI arXiv:2003.04807
[6] Chao ji, 2021, SEM SEQ LEARN
[7] Chapelle O., 2006, SEMISUPERVISED LEARN, P1, DOI DOI 10.7551/MITPRESS/9780262033589.001.0001
[8] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[9] Goodfellow I, 2016, ADAPT COMPUT MACH LE, P77
[10] Lang Ken, 1995, P INT C MACHINE LEAR, P331

← 1 2 3 →