A debiased self-training framework with graph self-supervised pre-training aided for semi-supervised rumor detection

被引:0
作者
Qiao, Yuhan [1 ]
Cui, Chaoqun [1 ]
Wang, Yiying [1 ]
Jia, Caiyan [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China
基金
中国国家自然科学基金;
关键词
Rumor detection; Self-training; Semi-supervised learning; Self-supervised learning; Confirmation bias; Graph representation; PROPAGATION; NETWORK;
D O I
10.1016/j.neucom.2024.128314
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing rumor detection models have achieved remarkable performance in fully-supervised settings. However, it is time-consuming and labor-intensive to obtain extensive labeled rumor data. To mitigate the reliance on labeled data, semi-supervised learning (SSL), jointly learning from labeled and unlabeled samples, achieves significant performance improvements at low costs. Commonly used self-training methods in SSL, despite their simplicity and efficiency, suffer from the notorious confirmation bias, which can be seen as the accumulation of noise arising from utilization of incorrect pseudo-labels. To deal with the problem, in this study, we propose a debiased self-training framework with graph self-supervised pre-training for semi-supervised rumor detection. First, to enhance the initial model for self-training and reduce the generation of incorrect pseudo-labels in early stages, we leverage the rumor propagation structures of massive unlabeled data for graph self-supervised pre-training. Second, we improve the quality of pseudo-labels by proposing a pseudo-labeling strategy with self- adaptive thresholds, which consists of self-paced global thresholds controlling the overall utilization process of pseudo-labels and local class-specific thresholds attending to the learning status of each class. Extensive experiments on four public benchmarks demonstrate that our method significantly outperforms previous rumor detection baselines in semi-supervised settings, especially when labeled samples are extremely scarce. Notably, we have achieved 96.3% accuracy on Weibo with 500 labels per class and 86.0% accuracy with just 5 labels per class.
引用
收藏
页数:14
相关论文
共 56 条
  • [1] Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning
    Arazo, Eric
    Ortego, Diego
    Albert, Paul
    O'Connor, Noel E.
    McGuinness, Kevin
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [2] Bengio Y., 2009, P 26 ANN INT C MACH, P41, DOI [DOI 10.1145/1553374.1553380, 10.1145/1553374.1553380]
  • [3] Berthelot D, 2019, ADV NEUR IN, V32
  • [4] Berthelot David, 2020, INT C LEARN REPR
  • [5] Bian T, 2020, AAAI CONF ARTIF INTE, V34, P549
  • [6] Cascante-Bonilla P, 2021, AAAI CONF ARTIF INTE, V35, P6912
  • [7] Castillo C., 2011, P 20 INT C WORLD WID, P675
  • [8] Chen Baixu, 2022, ADV NEUR IN
  • [9] Chen T., 2020, Advances in Neural Information Processing Systems, V33, P22243
  • [10] Cui CQ, 2024, AAAI CONF ARTIF INTE, P73