A debiased self-training framework with graph self-supervised pre-training aided for semi-supervised rumor detection

被引：0

作者：

Qiao, Yuhan ^{[1
]}

Cui, Chaoqun ^{[1
]}

Wang, Yiying ^{[1
]}

Jia, Caiyan ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 604卷

基金：

中国国家自然科学基金;

关键词：

Rumor detection; Self-training; Semi-supervised learning; Self-supervised learning; Confirmation bias; Graph representation; PROPAGATION; NETWORK;

D O I：

10.1016/j.neucom.2024.128314

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing rumor detection models have achieved remarkable performance in fully-supervised settings. However, it is time-consuming and labor-intensive to obtain extensive labeled rumor data. To mitigate the reliance on labeled data, semi-supervised learning (SSL), jointly learning from labeled and unlabeled samples, achieves significant performance improvements at low costs. Commonly used self-training methods in SSL, despite their simplicity and efficiency, suffer from the notorious confirmation bias, which can be seen as the accumulation of noise arising from utilization of incorrect pseudo-labels. To deal with the problem, in this study, we propose a debiased self-training framework with graph self-supervised pre-training for semi-supervised rumor detection. First, to enhance the initial model for self-training and reduce the generation of incorrect pseudo-labels in early stages, we leverage the rumor propagation structures of massive unlabeled data for graph self-supervised pre-training. Second, we improve the quality of pseudo-labels by proposing a pseudo-labeling strategy with self- adaptive thresholds, which consists of self-paced global thresholds controlling the overall utilization process of pseudo-labels and local class-specific thresholds attending to the learning status of each class. Extensive experiments on four public benchmarks demonstrate that our method significantly outperforms previous rumor detection baselines in semi-supervised settings, especially when labeled samples are extremely scarce. Notably, we have achieved 96.3% accuracy on Weibo with 500 labels per class and 86.0% accuracy with just 5 labels per class.

引用

页数：14

共 56 条

[1] Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning [J].

Arazo, Eric ;

Ortego, Diego ;

Albert, Paul ;

O'Connor, Noel E. ;

McGuinness, Kevin .

2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

[2]

Bengio Y., 2009, Proceedings of the 26th Annual International Conference on Machine Learning, P41, DOI DOI 10.1145/1553374.1553380

[3]

Berthelot D., 2020, P ICLR

[4]

Berthelot D, 2019, ADV NEUR IN, V32

[5]

Bian T, 2020, AAAI CONF ARTIF INTE, V34, P549

[6]

Cascante-Bonilla P, 2021, AAAI CONF ARTIF INTE, V35, P6912

[7]

Castillo C, 2011, P 20 INT C WORLD WID, P675, DOI 10.1109/MCI.2018.28407384

[8]

Chen Baixu, 2022, ADV NEUR IN

[9]

Cui CQ, 2024, AAAI CONF ARTIF INTE, P73

[10]

Cubuk ED, 2019, Arxiv, DOI arXiv:1909.13719

← 1 2 3 4 5 6 →