Matching user accounts based on user generated content across social networks

被引:61
作者
Li, Yongjun [1 ]
Zhang, Zhen [1 ]
Peng, You [1 ]
Yin, Hongzhi [2 ]
Xu, Quanqing [3 ]
机构
[1] Northwestern Polytech Univ, Sch Comp, Xian 710072, Shaanxi, Peoples R China
[2] Univ Queensland, Sch ITEE, Brisbane, Qld 4072, Australia
[3] ASTAR, Data Storage Inst, Singapore 138632, Singapore
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2018年 / 83卷
关键词
User identification; User generated content; Online behavior analysis; Social network; Machine learning; Algorithms; Experimentation; IDENTIFICATION;
D O I
10.1016/j.future.2018.01.041
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Matching user accounts can help us build better users' profiles and benefit many applications. It has attracted much attention from both industry and academia. Most of existing works are mainly based on rich user profile attributes. However, in many cases, user profile attributes are unavailable, incomplete or unreliable, either due to the privacy settings or just because users decline to share their information. This makes the existing schemes quite fragile. Users often share their activities on different social networks. This provides an opportunity to overcome the above problem. We aim to address the problem of user identification based on User Generated Content (UGC). We first formulate the problem of user identification based on UGCs and then propose a UGC-based user identification model. A supervised machine learning based solution is presented. It has three steps: firstly, we propose several algorithms to measure the spatial similarity, temporal similarity and content similarity of two UGCs; secondly, we extract the spatial, temporal and content features to exploit these similarities; afterwards, we employ the machine learning method to match user accounts, and conduct the experiments on three ground truth datasets. The results show that the proposed method has given excellent performance with F1 values reaching 89.79%, 86.78% and 86.24% on three ground truth datasets, respectively. This work presents the possibility of matching user accounts with high accessible online data. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:104 / 115
页数:12
相关论文
共 29 条
[1]   Cross-system user modeling and personalization on the Social Web [J].
Abel, Fabian ;
Herder, Eelco ;
Houben, Geert-Jan ;
Henze, Nicola ;
Krause, Daniel .
USER MODELING AND USER-ADAPTED INTERACTION, 2013, 23 (2-3) :169-209
[2]  
Almishari Mishari, 2012, Computer Security - ESORICS 2012. Proceedings 17th European Symposium on Research in Computer Security, P307, DOI 10.1007/978-3-642-33167-1_18
[3]  
[Anonymous], GLOB SOC MED RANK 20
[4]  
[Anonymous], 2013, P 22 INT C WORLD WID, DOI DOI 10.1145/2488388.2488428
[5]  
Bartunov S., 2012, P 6 SNA KDD WORKSH
[6]  
Bennacer N, 2014, LECT NOTES COMPUT SC, V8484, P424, DOI 10.1007/978-3-319-07881-6_29
[7]  
Goga O., 2014, THESIS
[8]  
Iofciu Tereza., 2011, ICWSM
[9]  
Jain P., 2012, ARXIV12126147
[10]  
Jiang X, 2016, ARXIV161007728