An Active Semi-Supervised Short Text Classification Method Based on Federated Learning

被引:0
作者
Kong, De-Yan [1 ]
Ji, Zhen-Yan [2 ]
Yang, Yan-Yan [1 ]
Liu, Yang [1 ]
Liu, Ji-Qiang [2 ]
机构
[1] School of Software Engineering, Beijing Jiaotong University, Beijing
[2] Beijing Key Laboratory of Security and Privacy in Intelligent Transportation, School of Cyberspace Science and Techonology, Beijing Jiaotong University, Beijing
来源
Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2024年 / 52卷 / 10期
基金
中国国家自然科学基金;
关键词
active learning; federated learning; heterogeneous graph neural network; semi-supervised learning;
D O I
10.12263/DZXB.20230703
中图分类号
学科分类号
摘要
Short-text classification is broadly used and is a current hot research spot. However, the performance of short-text classification is hampered by the sca1rcity of annotated data for short texts and the challenges of centralized training for private data. To address these issues, we propose Fed-ASSL-HGAT (Active Semi-Supervised Heterogeneous Graph ATtention network model based on Federated learning), an active semi-supervised heterogeneous graph attention network model based on federated learning. This model utilizes the innovative active semi-supervised learning (ASSL) framework to generate high-quality labeled samples for empowering the heterogeneous graph attention network (HGAT) model. Additionally, federated learning is introduced to facilitate the joint training of the models deployed on different nodes, thereby satisfying the requirements of data privacy protection. The proposed ASSL framework significantly reduces the annotation difficulty by transforming the multi-class annotation task into a binary classification task. To mitigate information loss, we employ a selection strategy based on information gain to filter soft and hard labels. Semi-supervised learning is employed to select positive and negative samples with high accuracy and stability for pseudo-labeling, thereby ensuring the labeling quality. Experimental results demonstrate that the proposed ASSL-HGAT (Active Semi-supervised Learning Empowered Heterogeneous Graph Attention Network) model achieves improvements of 2.45%, 8.11%, and 7.46% in F1 scores comparing with the HGAT baseline model on the AGNews, Snippets, and TagMyNews datasets, respectively. By incorporating the federated learning, the Fed-ASSL-HGAT model can meet the performance requirements without scarifying data privacy. © 2024 Chinese Institute of Electronics. All rights reserved.
引用
收藏
页码:3517 / 3526
页数:9
相关论文
共 22 条
[1]  
ZHANG Y, LIU K F, ZHANG Q X, Et al., A combined-convolutional neural network for Chinese news text classification, Acta Electronica Sinica, 49, 6, pp. 1059-1067, (2021)
[2]  
LI X Y, WANG T L, LIANG P, Et al., Automatic classification of non-functional requirements in App user reviews based on system model, Acta Electronica Sinica, 50, 9, pp. 2079-2089, (2022)
[3]  
YANG T C, HU L M, SHI C, Et al., HGAT: Heterogeneous graph attention networks for semi-supervised short text classification, ACM Transactions on Information Systems, 39, 3
[4]  
MCMAHAN H B, MOORE E, RAMAGE D, Et al., Communication-efficient learning of deep networks from decentralized data
[5]  
ISCEN A, TOLIAS G, AVRITHIS Y, Et al., Label propagation for deep semi-supervised learning, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5065-5074, (2019)
[6]  
HAASE-SCHUTZ C, STAL R, HERTLEIN H, Et al., Iterative label improvement: Robust training by confidence based filtering and dataset partitioning, 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9483-9490, (2021)
[7]  
BERTHELOT D, CARLINI N, Goodfellow I, Et al., Mix-match: A holistic approach to semi-supervised learning, Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 5049-5059, (2019)
[8]  
ZHOU T Y, WANG S J, BILMES J A., Time-consistent self-supervision for semi-supervised learning, Proceedings of the 37th International Conference on Machine Learning, pp. 11523-11533, (2020)
[9]  
RIZVE M N, DUARTE K, RAWAT Y S, Et al., In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning
[10]  
LI Y C, XIAO F, CHEN Z, Et al., Adaptive active learning for semi-supervised learning, Journal of Software, 31, 12, pp. 3808-3822, (2020)