Rapid Re-Identification Risk Assessment for Anonymous Data Set in Mobile Multimedia Scene

被引:5
|
作者
Yang, Zhigang [1 ,2 ,3 ,4 ]
Wang, Ruyan [1 ,3 ,4 ]
Luo, Daizhong [2 ]
Xiong, Yu [2 ]
机构
[1] Chongqing Univ Posts & Telecommun, Sch Commun & Informat Engn, Chongqing 400065, Peoples R China
[2] Chongqing Univ Arts & Sci, Sch Artificial Intelligence, Chongqing 402160, Peoples R China
[3] Key Lab Opt Commun & Networks, Chongqing 400065, Peoples R China
[4] Key Lab Ubiquitous Sensing & Networking, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
Data privacy; Data models; Trajectory; Risk management; Couplings; Privacy; Multimedia systems; Multimedia; privacy; overall re-identification risk; attribute dependency; DE-ANONYMIZATION;
D O I
10.1109/ACCESS.2020.2977404
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Ubiquitous mobile multimedia applications bring great convenience to users. However, when enjoying mobile multimedia services, users provide personal data to service platforms. Although the service platforms always claim that the collected personal data are de-identified, the risk of re-identifying users through linkage attacks still exists and is incalculable. This paper proposes a rapid prediction model for the overall re-identification risk based on the statistics of data sets (i.e., the number of individuals, number of attributes, distribution of attribute values, and attribute dependency). Our proposed model reveals the impact of statistics on the overall re-identification risk and adopts random sampling and semi-random sampling methods to predict the overall re-identification risk of data sets with and without strong dependency ordered attribute pairs. Experimental results show that for the data sets without strong dependency ordered attribute pairs, the random sampling method has a high prediction accuracy (the prediction error is less than 0.05). For the data sets with strong dependency ordered attribute pairs, the semi-random sampling method has a high prediction accuracy (the prediction error is less than 0.09). Exploiting our model, governments and individuals can quickly assess the privacy leakage risk of their data sets, given only the statistic of the data sets. Besides, this model can also evaluate the privacy risk of data collection schemes in advance according to historical statistics, and identify suspected services.
引用
收藏
页码:41557 / 41565
页数:9
相关论文
共 50 条
  • [41] Risk Identification of Personally Identifiable Information from Collective Mobile App Data
    Onik, Md Mehedi Hassan
    Al-Zaben, Nasr
    Yang, Jinhong
    Lee, Nam-Yong
    Kim, Chul-Soo
    2018 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRONICS & COMMUNICATIONS ENGINEERING (ICCECE), 2018, : 71 - 76
  • [42] 2D-SNet: A Lightweight Network for Person Re-Identification on the Small Data Regime
    Li, Wei
    Shao, Shitong
    Qiu, Ziming
    Zhu, Zhihao
    Song, Aiguo
    IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2024, 6 (01): : 68 - 78
  • [43] Re-identification Attack to Privacy-Preserving Data Analysis with Noisy Sample-Mean
    Su, Du
    Hieu Tri Huynh
    Chen, Ziao
    Lu, Yi
    Lu, Wenmiao
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1045 - 1053
  • [44] How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems
    Malin, B
    Sweeney, L
    JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (03) : 179 - 192
  • [45] Deep learning-based patient re-identification is able to exploit the biometric nature of medical chest X-ray data
    Packhaeuser, Kai
    Guendel, Sebastian
    Muenster, Nicolas
    Syben, Christopher
    Christlein, Vincent
    Maier, Andreas
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [46] How Adversarial Assumptions Influence Re-identification Risk Measures: A COVID-19 Case Study
    Zhang, Xinmeng
    Wan, Zhiyu
    Yan, Chao
    Brown, J. Thomas
    Xia, Weiyi
    Gkoulalas-Divanis, Aris
    Kantarcioglu, Murat
    Malin, Bradley
    PRIVACY IN STATISTICAL DATABASES, PSD 2022, 2022, 13463 : 361 - 374
  • [47] Differentially-private data synthetisation for efficient re-identification risk controlDifferentially-private data synthetisation for efficient...T. Carvalho et al.
    Tânia Carvalho
    Nuno Moniz
    Luís Antunes
    Nitesh Chawla
    Machine Learning, 2025, 114 (7)
  • [48] Hide-and-Seek Privacy Challenge: Synthetic Data Generation vs. Patient Re-identification
    Jordon, James
    Jarrett, Daniel
    Saveliev, Evgeny
    Yoon, Jinsung
    Elbers, Paul
    Thoral, Patrick
    Ercole, Ari
    Zhang, Cheng
    Belgrave, Danielle
    van der Schaar, Mihaela
    NEURIPS 2020 COMPETITION AND DEMONSTRATION TRACK, VOL 133, 2020, 133 : 206 - 215
  • [49] From Multi-Source Virtual to Real: Effective Virtual Data Search for Vehicle Re-Identification
    Wan, Zhijing
    Xu, Xin
    Wang, Zheng
    Wang, Zhixiang
    Hu, Ruimin
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (05) : 3433 - 3444
  • [50] A comparative study between state-of-the-art MRI deidentification and AnonyMI, a new method combining re-identification risk reduction and geometrical preservation
    Mikulan, Ezequiel
    Russo, Simone
    Zauli, Flavia Maria
    d'Orio, Piergiorgio
    Parmigiani, Sara
    Favaro, Jacopo
    Knight, William
    Squarza, Silvia
    Perri, Pierluigi
    Cardinale, Francesco
    Avanzini, Pietro
    Pigorini, Andrea
    HUMAN BRAIN MAPPING, 2021, 42 (17) : 5523 - 5534