A review of semi-supervised learning for text classification

被引:30
|
作者
Duarte, Jose Marcio [1 ]
Berton, Lilian [1 ]
机构
[1] Univ Fed Sao Paulo, Sci & Technol Dept, Cesare Mansueto Giulio Lattes Ave 1201, BR-12247014 Sao Jose Dos Campos, SP, Brazil
关键词
Natural language processing; Text classification; Machine learning; Semi-supervised learning; SENTIMENT ANALYSIS; INFORMATION; SELECTION;
D O I
10.1007/s10462-023-10393-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A huge amount of data is generated daily leading to big data challenges. One of them is related to text mining, especially text classification. To perform this task we usually need a large set of labeled data that can be expensive, time-consuming, or difficult to be obtained. Considering this scenario semi-supervised learning (SSL), the branch of machine learning concerned with using labeled and unlabeled data has expanded in volume and scope. Since no recent survey exists to overview how SSL has been used in text classification, we aim to fill this gap and present an up-to-date review of SSL for text classification. We retrieve 1794 works from the last 5 years from IEEE Xplore, ACM Digital Library, Science Direct, and Springer. Then, 157 articles were selected to be included in this review. We present the application domain, datasets, and languages employed in the works. The text representations and machine learning algorithms. We also summarize and organize the works following a recent taxonomy of SSL. We analyze the percentage of labeled data used, the evaluation metrics, and obtained results. Lastly, we present some limitations and future trends in the area. We aim to provide researchers and practitioners with an outline of the area as well as useful information for their current research.
引用
收藏
页码:9401 / 9469
页数:69
相关论文
共 50 条
  • [1] A review of semi-supervised learning for text classification
    José Marcio Duarte
    Lilian Berton
    Artificial Intelligence Review, 2023, 56 : 9401 - 9469
  • [2] Semi-Supervised Text Classification With Universum Learning
    Liu, Chien-Liang
    Hsaio, Wen-Hoar
    Lee, Chia-Hoang
    Chang, Tao-Hsing
    Kuo, Tsung-Hsun
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (02) : 462 - 473
  • [3] TEXT CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING
    Vo Duy Thanh
    Vo Trung Hung
    Pham Minh Tuan
    Doan Van Ban
    2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 232 - 236
  • [4] SEMI-SUPERVISED LEARNING FOR TEXT CLASSIFICATION BY LAYER PARTITIONING
    Li, Alexander Hanbo
    Sethy, Abhinav
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6164 - 6168
  • [5] An Exploration of Semi-supervised Text Classification
    Lien, Henrik
    Biermann, Daniel
    Palumbo, Fabrizio
    Goodwin, Morten
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2022, 2022, 1600 : 477 - 488
  • [6] Semi-supervised collaborative text classification
    Jin, Rong
    Wu, Ming
    Sukthankar, Rahul
    MACHINE LEARNING: ECML 2007, PROCEEDINGS, 2007, 4701 : 600 - +
  • [7] Graph-based Semi-supervised Learning for Text Classification
    Widmann, Natalie
    Verberne, Suzan
    ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 59 - 66
  • [8] Cooperative Hybrid Semi-Supervised Learning for Text Sentiment Classification
    Li, Yang
    Lv, Ying
    Wang, Suge
    Liang, Jiye
    Li, Juanzi
    Li, Xiaoli
    SYMMETRY-BASEL, 2019, 11 (02):
  • [9] Text Classification Method Based On Semi-Supervised Transfer Learning
    Yu, Xiaosheng
    Zhang, Hehuan
    Li, Jing
    2021 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY COMPANION (QRS-C 2021), 2021, : 388 - 394
  • [10] Improving Semi-Supervised Text Classification with Dual Meta-Learning
    Li, Shujie
    Yuan, Guanghu
    Yang, Min
    Shen, Ying
    Li, Chengming
    Xu, Ruifeng
    Zhao, Xiaoyan
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (04)