SSDMV: Semi-supervised Deep Social Spammer Detection by Multi-View Data Fusion

被引:22
作者
Li, Chaozhuo [1 ]
Wang, Senzhang [2 ]
He, Lifang [3 ]
Yu, Philip S. [4 ,5 ]
Liang, Yanbo [6 ]
Li, Zhoujun [1 ]
机构
[1] Beihang Univ, Beijing, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Beijing, Peoples R China
[3] Cornell Univ, New York, NY 10021 USA
[4] Fudan Univ, Shanghai, Peoples R China
[5] Univ Illinois, Chicago, IL 60680 USA
[6] Hortonworks Inc, San Jose, CA USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2018年
基金
国家重点研发计划;
关键词
Social Spammer Detection; Deep Learning; Semi-supervised Learning;
D O I
10.1109/ICDM.2018.00040
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The explosive use of social media makes it a popular platform for malicious users, known as social spammers, to overwhelm legitimate users with unwanted content. Most existing social spammer detection approaches are supervised and need a large number of manually labeled data for training, which is infeasible in practice. To address this issue, some semi-supervised models are proposed by incorporating side information such as user profiles and posted tweets. However, these shallow models are not effective to deeply learn the desirable user representations for spammer detection, and the multi-view data are usually loosely coupled without considering their correlations. In this paper, we propose a Semi-Supervised Deep social spammer detection model by Multi-View data fusion (SSDMV). The insight is that we aim to extensively learn the task-relevant discriminative representations for users to address the challenge of annotation scarcity. Under a unified semi-supervised learning framework, we first design a deep multi-view feature learning module which fuses information from different views, and then propose a label inference module to predict labels for users. The mutual refinement between the two modules ensures SSDMV to be able to both generate high quality features and make accurate predictions. Empirically, we evaluate SSDMV over two real social network datasets on three tasks, and the results demonstrate that SSDMV significantly outperforms the state-of-the-art methods.
引用
收藏
页码:247 / 256
页数:10
相关论文
共 38 条
[1]  
Alain G, 2014, J MACH LEARN RES, V15, P3563
[2]  
Andrienko G., 2013, Introduction, P1
[3]  
[Anonymous], 2015, ARXIV PREPRINT ARXIV
[4]  
[Anonymous], 2008, ICML 08, DOI 10.1145/1390156.1390294
[5]  
[Anonymous], COLING
[6]  
[Anonymous], 2015, Advances in independent component analysis and learning machines
[7]  
[Anonymous], NEUROCOMPUTING
[8]  
[Anonymous], 2011, ICWSM
[9]  
[Anonymous], AAAI
[10]  
[Anonymous], 2007, INT C MACH LEARN ICM