A debiased self-training framework with graph self-supervised pre-training aided for semi-supervised rumor detection

被引:0
作者
Qiao, Yuhan [1 ]
Cui, Chaoqun [1 ]
Wang, Yiying [1 ]
Jia, Caiyan [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China
基金
中国国家自然科学基金;
关键词
Rumor detection; Self-training; Semi-supervised learning; Self-supervised learning; Confirmation bias; Graph representation; PROPAGATION; NETWORK;
D O I
10.1016/j.neucom.2024.128314
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing rumor detection models have achieved remarkable performance in fully-supervised settings. However, it is time-consuming and labor-intensive to obtain extensive labeled rumor data. To mitigate the reliance on labeled data, semi-supervised learning (SSL), jointly learning from labeled and unlabeled samples, achieves significant performance improvements at low costs. Commonly used self-training methods in SSL, despite their simplicity and efficiency, suffer from the notorious confirmation bias, which can be seen as the accumulation of noise arising from utilization of incorrect pseudo-labels. To deal with the problem, in this study, we propose a debiased self-training framework with graph self-supervised pre-training for semi-supervised rumor detection. First, to enhance the initial model for self-training and reduce the generation of incorrect pseudo-labels in early stages, we leverage the rumor propagation structures of massive unlabeled data for graph self-supervised pre-training. Second, we improve the quality of pseudo-labels by proposing a pseudo-labeling strategy with self- adaptive thresholds, which consists of self-paced global thresholds controlling the overall utilization process of pseudo-labels and local class-specific thresholds attending to the learning status of each class. Extensive experiments on four public benchmarks demonstrate that our method significantly outperforms previous rumor detection baselines in semi-supervised settings, especially when labeled samples are extremely scarce. Notably, we have achieved 96.3% accuracy on Weibo with 500 labels per class and 86.0% accuracy with just 5 labels per class.
引用
收藏
页数:14
相关论文
共 56 条
[1]   Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning [J].
Arazo, Eric ;
Ortego, Diego ;
Albert, Paul ;
O'Connor, Noel E. ;
McGuinness, Kevin .
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[2]  
Bengio Y., 2009, Proceedings of the 26th Annual International Conference on Machine Learning, P41, DOI DOI 10.1145/1553374.1553380
[3]  
Berthelot D., 2020, P ICLR
[4]  
Berthelot D, 2019, ADV NEUR IN, V32
[5]  
Bian T, 2020, AAAI CONF ARTIF INTE, V34, P549
[6]  
Cascante-Bonilla P, 2021, AAAI CONF ARTIF INTE, V35, P6912
[7]  
Castillo C, 2011, P 20 INT C WORLD WID, P675, DOI 10.1109/MCI.2018.28407384
[8]  
Chen Baixu, 2022, ADV NEUR IN
[9]  
Cui CQ, 2024, AAAI CONF ARTIF INTE, P73
[10]  
Cubuk ED, 2019, Arxiv, DOI arXiv:1909.13719