Disclosure Risk and Data Utility for Partially Synthetic Data: An Empirical Study Using the German IAB Establishment Survey

被引:0
|
作者
Drechsler, Joerg [1 ]
Reiter, J. P. [2 ]
机构
[1] Inst Employment Res, D-90478 Nurnberg, Germany
[2] Duke Univ, Dept Stat Sci, Durham, NC 27708 USA
基金
美国国家科学基金会;
关键词
Confidentiality; disclosures; multiple imputation; synthetic data; MULTIPLE-IMPUTATION; IDENTIFICATION DISCLOSURE; MICRODATA;
D O I
暂无
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Statistical agencies that disseminate data to the public must protect the confidentiality of respondents' identities and sensitive attributes. To satisfy these requirements, agencies can release the units originally surveyed with some values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. These are called partially synthetic data. In this article, we empirically examine trade-offs between inferential accuracy and confidentiality risks for partially synthetic data, with emphasis oil the role of the number of released datasets. We also present a two-stage imputation scheme that allows agencies to release different numbers of imputations for different variables. This scheme can result in lower disclosure risks and higher data utility than the typical one-stage imputation with the same number of released datasets. The empirical analyses are based oil partial synthesis of the German IAB Establishment Survey.
引用
收藏
页码:589 / 603
页数:15
相关论文
共 45 条
  • [31] Using observational study data as an external control group for a clinical trial: an empirical comparison of methods to account for longitudinal missing data
    Norvang, Vibeke
    Haavardsholm, Espen A.
    Tedeschi, Sara K.
    Lyu, Houchen
    Sexton, Joseph
    Mjaavatten, Maria D.
    Kvien, Tore K.
    Solomon, Daniel H.
    Yoshida, Kazuki
    BMC MEDICAL RESEARCH METHODOLOGY, 2022, 22 (01)
  • [32] Using observational study data as an external control group for a clinical trial: an empirical comparison of methods to account for longitudinal missing data
    Vibeke Norvang
    Espen A. Haavardsholm
    Sara K. Tedeschi
    Houchen Lyu
    Joseph Sexton
    Maria D. Mjaavatten
    Tore K. Kvien
    Daniel H. Solomon
    Kazuki Yoshida
    BMC Medical Research Methodology, 22
  • [33] Examining the Utility of Differentially Private Synthetic Data Generated using Variational Autoencoder with TensorFlow Privacy
    Tai, Bo-Chen
    Li, Szu-Chuang
    Huang, Yennun
    Wang, Pang-Chieh
    2022 IEEE 27TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING (PRDC), 2022, : 236 - 241
  • [34] Synthetic Census Microdata Generation: A Comparative Study of Synthesis Methods Examining the Trade-Off Between Disclosure Risk and Utility
    Little, Claire
    Allmendinger, Richard
    Elliot, Mark
    JOURNAL OF OFFICIAL STATISTICS, 2025, 41 (01) : 255 - 308
  • [35] Data dissemination and disclosure limitation in a world without microdata: A risk-utility framework for remote access analysis servers
    Gomatam, S
    Karr, AF
    Reiter, JP
    Sanil, AP
    STATISTICAL SCIENCE, 2005, 20 (02) : 163 - 177
  • [36] Identifying patterns of item missing survey data using latent groups: an observational study
    Barnett, Adrian G.
    McElwee, Paul
    Nathan, Andrea
    Burton, Nicola W.
    Turrell, Gavin
    BMJ OPEN, 2017, 7 (10):
  • [37] Using synthetic data to replace linkage derived elements: a case study
    Resnick, Dean M.
    Cox, Christine S.
    Mirel, Lisa B.
    HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY, 2021, 21 (03) : 389 - 406
  • [38] Using synthetic data to replace linkage derived elements: a case study
    Dean M. Resnick
    Christine S. Cox
    Lisa B. Mirel
    Health Services and Outcomes Research Methodology, 2021, 21 : 389 - 406
  • [39] Solving problems of disclosure risk in an academic setting: Using a combination of restricted data and restricted access methods
    Rodgers, Willard
    Nolte, Michael
    JOURNAL OF EMPIRICAL RESEARCH ON HUMAN RESEARCH ETHICS, 2006, 1 (03) : 85 - 97
  • [40] A Comparative Study on Various ML Models using Synthetic Data for Privacy Preservation
    Uddin, Md Ashraf
    Ahsan, Md Naimul
    Das, Mrinmoy
    4TH INTERDISCIPLINARY CONFERENCE ON ELECTRICS AND COMPUTER, INTCEC 2024, 2024,