Disclosure Risk and Data Utility for Partially Synthetic Data: An Empirical Study Using the German IAB Establishment Survey

被引:0
|
作者
Drechsler, Joerg [1 ]
Reiter, J. P. [2 ]
机构
[1] Inst Employment Res, D-90478 Nurnberg, Germany
[2] Duke Univ, Dept Stat Sci, Durham, NC 27708 USA
基金
美国国家科学基金会;
关键词
Confidentiality; disclosures; multiple imputation; synthetic data; MULTIPLE-IMPUTATION; IDENTIFICATION DISCLOSURE; MICRODATA;
D O I
暂无
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Statistical agencies that disseminate data to the public must protect the confidentiality of respondents' identities and sensitive attributes. To satisfy these requirements, agencies can release the units originally surveyed with some values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. These are called partially synthetic data. In this article, we empirically examine trade-offs between inferential accuracy and confidentiality risks for partially synthetic data, with emphasis oil the role of the number of released datasets. We also present a two-stage imputation scheme that allows agencies to release different numbers of imputations for different variables. This scheme can result in lower disclosure risks and higher data utility than the typical one-stage imputation with the same number of released datasets. The empirical analyses are based oil partial synthesis of the German IAB Establishment Survey.
引用
收藏
页码:589 / 603
页数:15
相关论文
共 45 条
  • [1] A comparison of synthetic data approaches using utility and disclosure risk measures
    An, Seongbin
    Doan, Trang
    Lee, Juhee
    Kim, Jiwoo
    Kim, Yong Jae
    Kim, Yunji
    Yoon, Changwon
    Jung, Sungkyu
    Kim, Dongha
    Kwon, Sunghoon
    Kim, Hang J.
    Ahn, Jeongyou
    Park, Cheolwo
    KOREAN JOURNAL OF APPLIED STATISTICS, 2023, 36 (02) : 141 - 166
  • [2] Generating partially synthetic geocoded public use data with decreased disclosure risk by using differential smoothing
    Quick, Harrison
    Holan, Scott H.
    Wikle, Christopher K.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2018, 181 (03) : 649 - 661
  • [3] Disclosure Risk Evaluation for Fully Synthetic Categorical Data
    Hu, Jingchen
    Reiter, Jerome P.
    Wang, Quanli
    PRIVACY IN STATISTICAL DATABASES, PSD 2014, 2014, 8744 : 185 - 199
  • [4] A new approach for disclosure control in the IAB establishment panel—multiple imputation for a better data access
    Jörg Drechsler
    Agnes Dundler
    Stefan Bender
    Susanne Rässler
    Thomas Zwick
    AStA Advances in Statistical Analysis, 2008, 92 : 439 - 458
  • [5] New data dissemination approaches in old Europe - synthetic datasets for a German establishment survey
    Drechsler, Joerg
    JOURNAL OF APPLIED STATISTICS, 2012, 39 (02) : 243 - 265
  • [6] Comparing the Utility and Disclosure Risk of Synthetic Data with Samples of Microdata
    Little, Claire
    Elliot, Mark
    Allmendinger, Richard
    PRIVACY IN STATISTICAL DATABASES, PSD 2022, 2022, 13463 : 234 - 249
  • [7] A new approach for disclosure control in the IAB establishment panel-multiple imputation for a better data access
    Drechsler, Joerg
    Dundler, Agnes
    Bender, Stefan
    Raessler, Susanne
    Zwick, Thomas
    ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2008, 92 (04) : 439 - 458
  • [8] Disclosure control using partially synthetic data for large-scale health surveys, with applications to CanCORS
    Loong, Bronwyn
    Zaslavsky, Alan M.
    He, Yulei
    Harrington, David P.
    STATISTICS IN MEDICINE, 2013, 32 (24) : 4139 - 4161
  • [9] An Empirical Study of Utility and Disclosure Risk for Tabular Data Synthesis Models: In-Depth Analysis and Interesting Findings
    Park, Dae-Young
    Ko, In-Young
    2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 67 - 74
  • [10] A Baseline for Attribute Disclosure Risk in Synthetic Data
    Hittmeir, Markus
    Mayer, Rudolf
    Ekelhart, Andreas
    PROCEEDINGS OF THE TENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, CODASPY 2020, 2020, : 133 - 143