Rapid Re-Identification Risk Assessment for Anonymous Data Set in Mobile Multimedia Scene

被引：5

作者：

Yang, Zhigang ^{[1
,2
,3
,4
]}

Wang, Ruyan ^{[1
,3
,4
]}

Luo, Daizhong ^{[2
]}

Xiong, Yu ^{[2
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Sch Commun & Informat Engn, Chongqing 400065, Peoples R China

[2] Chongqing Univ Arts & Sci, Sch Artificial Intelligence, Chongqing 402160, Peoples R China

[3] Key Lab Opt Commun & Networks, Chongqing 400065, Peoples R China

[4] Key Lab Ubiquitous Sensing & Networking, Chongqing 400065, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

基金：

中国国家自然科学基金;

关键词：

Data privacy; Data models; Trajectory; Risk management; Couplings; Privacy; Multimedia systems; Multimedia; privacy; overall re-identification risk; attribute dependency; DE-ANONYMIZATION;

D O I：

10.1109/ACCESS.2020.2977404

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Ubiquitous mobile multimedia applications bring great convenience to users. However, when enjoying mobile multimedia services, users provide personal data to service platforms. Although the service platforms always claim that the collected personal data are de-identified, the risk of re-identifying users through linkage attacks still exists and is incalculable. This paper proposes a rapid prediction model for the overall re-identification risk based on the statistics of data sets (i.e., the number of individuals, number of attributes, distribution of attribute values, and attribute dependency). Our proposed model reveals the impact of statistics on the overall re-identification risk and adopts random sampling and semi-random sampling methods to predict the overall re-identification risk of data sets with and without strong dependency ordered attribute pairs. Experimental results show that for the data sets without strong dependency ordered attribute pairs, the random sampling method has a high prediction accuracy (the prediction error is less than 0.05). For the data sets with strong dependency ordered attribute pairs, the semi-random sampling method has a high prediction accuracy (the prediction error is less than 0.09). Exploiting our model, governments and individuals can quickly assess the privacy leakage risk of their data sets, given only the statistic of the data sets. Besides, this model can also evaluate the privacy risk of data collection schemes in advance according to historical statistics, and identify suspected services.

引用

页码：41557 / 41565

页数：9

共 50 条

[41] Risk Identification of Personally Identifiable Information from Collective Mobile App Data
Onik, Md Mehedi Hassan
Al-Zaben, Nasr
Yang, Jinhong
Lee, Nam-Yong
Kim, Chul-Soo
2018 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRONICS & COMMUNICATIONS ENGINEERING (ICCECE), 2018, : 71 - 76
[42] 2D-SNet: A Lightweight Network for Person Re-Identification on the Small Data Regime
Li, Wei
Shao, Shitong
Qiu, Ziming
Zhu, Zhihao
Song, Aiguo
IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2024, 6 (01): : 68 - 78
[43] Re-identification Attack to Privacy-Preserving Data Analysis with Noisy Sample-Mean
Su, Du
Hieu Tri Huynh
Chen, Ziao
Lu, Yi
Lu, Wenmiao
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1045 - 1053
[44] How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems
Malin, B
Sweeney, L
JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (03) : 179 - 192
[45] Deep learning-based patient re-identification is able to exploit the biometric nature of medical chest X-ray data
Packhaeuser, Kai
Guendel, Sebastian
Muenster, Nicolas
Syben, Christopher
Christlein, Vincent
Maier, Andreas
SCIENTIFIC REPORTS, 2022, 12 (01)
[46] How Adversarial Assumptions Influence Re-identification Risk Measures: A COVID-19 Case Study
Zhang, Xinmeng
Wan, Zhiyu
Yan, Chao
Brown, J. Thomas
Xia, Weiyi
Gkoulalas-Divanis, Aris
Kantarcioglu, Murat
Malin, Bradley
PRIVACY IN STATISTICAL DATABASES, PSD 2022, 2022, 13463 : 361 - 374
[47] Differentially-private data synthetisation for efficient re-identification risk controlDifferentially-private data synthetisation for efficient...T. Carvalho et al.
Tânia Carvalho
Nuno Moniz
Luís Antunes
Nitesh Chawla
Machine Learning, 2025, 114 (7)
[48] Hide-and-Seek Privacy Challenge: Synthetic Data Generation vs. Patient Re-identification
Jordon, James
Jarrett, Daniel
Saveliev, Evgeny
Yoon, Jinsung
Elbers, Paul
Thoral, Patrick
Ercole, Ari
Zhang, Cheng
Belgrave, Danielle
van der Schaar, Mihaela
NEURIPS 2020 COMPETITION AND DEMONSTRATION TRACK, VOL 133, 2020, 133 : 206 - 215
[49] From Multi-Source Virtual to Real: Effective Virtual Data Search for Vehicle Re-Identification
Wan, Zhijing
Xu, Xin
Wang, Zheng
Wang, Zhixiang
Hu, Ruimin
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (05) : 3433 - 3444
[50] A comparative study between state-of-the-art MRI deidentification and AnonyMI, a new method combining re-identification risk reduction and geometrical preservation
Mikulan, Ezequiel
Russo, Simone
Zauli, Flavia Maria
d'Orio, Piergiorgio
Parmigiani, Sara
Favaro, Jacopo
Knight, William
Squarza, Silvia
Perri, Pierluigi
Cardinale, Francesco
Avanzini, Pietro
Pigorini, Andrea
HUMAN BRAIN MAPPING, 2021, 42 (17) : 5523 - 5534

← 1 2 3 4 5 →