Matching user identities across social networks with limited profile data

被引:6
作者
Nurgaliev, Ildar [1 ,2 ]
Qu, Qiang [1 ]
Bamakan, Seyed Mojtaba Hosseini [1 ,3 ]
Muzammal, Muhammad [1 ,4 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
[2] Sberbank, Moscow 121170, Russia
[3] Yazd Univ, Dept Management, Yazd 89195741, Iran
[4] Bahria Univ, Dept Comp Sci, Islamabad 44000, Pakistan
关键词
social networks; user identity linkage; graph structure learning; maximum subgraph matching; graph percolation; MAXIMUM-LIKELIHOOD; DISTANCE; ALGORITHM; SUBGRAPH; GRAPHS;
D O I
10.1007/s11704-019-8235-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Privacy preservation is a primary concern in social networks which employ a variety of privacy preservations mechanisms to preserve and protect sensitive user information including age, location, education, interests, and others. The task of matching user identities across different social networks is considered a challenging task. In this work, we propose an algorithm to reveal user identities as a set of linked accounts from different social networks using limited user profile data, i.e., user-name and friendship. Thus, we propose a framework, ExpandUIL, that includes three standalone algorithms based on (i) the percolation graph matching in ExpandFullName algorithm, (ii) a supervised machine learning algorithm that works with the graph embedding, and (iii) a combination of the two, ExpandUserLinkage algorithm. The proposed framework as a set of algorithms is significant as, (i) it is based on the network topology and requires only name feature of the nodes, (ii) it requires a considerably low initial seed, as low as one initial seed suffices, (iii) it is iterative and scalable with applicability to online incoming stream graphs, and (iv) it has an experimental proof of stability over a real ground-truth dataset. Experiments on real datasets, Instagram and VK social networks, show upto 75% recall for linked accounts with 96% accuracy using only one given seed pair.
引用
收藏
页数:14
相关论文
共 42 条
[1]  
Al-Azizy Dalal, 2015, LECT NOTES COMPUTER, P36, DOI DOI 10.1007/978-3-319-31811-0
[2]   Graph matching and clustering using kernel attributes [J].
Angel Lozano, Miguel ;
Escolano, Francisco .
NEUROCOMPUTING, 2013, 113 :177-194
[3]  
[Anonymous], ACM MULTIMEDIA
[4]   Emergence of scaling in random networks [J].
Barabási, AL ;
Albert, R .
SCIENCE, 1999, 286 (5439) :509-512
[5]   Two density-based k-means initialization algorithms for non-metric data clustering [J].
Bianchi, Filippo Maria ;
Livi, Lorenzo ;
Rizzi, Antonello .
PATTERN ANALYSIS AND APPLICATIONS, 2016, 19 (03) :745-763
[6]   A Granular Computing approach to the design of optimized graph classification systems [J].
Bianchi, Filippo Maria ;
Livi, Lorenzo ;
Rizzi, Antonello ;
Sadeghian, Alireza .
SOFT COMPUTING, 2014, 18 (02) :393-412
[7]   A graph distance metric based on the maximal common subgraph [J].
Bunke, H ;
Shearer, K .
PATTERN RECOGNITION LETTERS, 1998, 19 (3-4) :255-259
[8]   Inexact graph matching for structural pattern recognition [J].
Bunke, H. ;
Allermann, G. .
PATTERN RECOGNITION LETTERS, 1983, 1 (04) :245-253
[9]   On a relation between graph edit distance and maximum common subgraph [J].
Bunke, H .
PATTERN RECOGNITION LETTERS, 1997, 18 (08) :689-694
[10]   User identification for cross-system personalisation [J].
Carmagnola, Francesca ;
Cena, Federica .
INFORMATION SCIENCES, 2009, 179 (1-2) :16-32