Disclosure Risk and Data Utility for Partially Synthetic Data: An Empirical Study Using the German IAB Establishment Survey

被引：0

作者：

Drechsler, Joerg ^{[1
]}

Reiter, J. P. ^{[2
]}

机构：

[1] Inst Employment Res, D-90478 Nurnberg, Germany

[2] Duke Univ, Dept Stat Sci, Durham, NC 27708 USA

来源：

JOURNAL OF OFFICIAL STATISTICS | 2009年 / 25卷 / 04期

基金：

美国国家科学基金会;

关键词：

Confidentiality; disclosures; multiple imputation; synthetic data; MULTIPLE-IMPUTATION; IDENTIFICATION DISCLOSURE; MICRODATA;

D O I：

暂无

中图分类号：

O1 [数学]; C [社会科学总论];

学科分类号：

03 ; 0303 ; 0701 ; 070101 ;

摘要：

Statistical agencies that disseminate data to the public must protect the confidentiality of respondents' identities and sensitive attributes. To satisfy these requirements, agencies can release the units originally surveyed with some values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. These are called partially synthetic data. In this article, we empirically examine trade-offs between inferential accuracy and confidentiality risks for partially synthetic data, with emphasis oil the role of the number of released datasets. We also present a two-stage imputation scheme that allows agencies to release different numbers of imputations for different variables. This scheme can result in lower disclosure risks and higher data utility than the typical one-stage imputation with the same number of released datasets. The empirical analyses are based oil partial synthesis of the German IAB Establishment Survey.

引用

页码：589 / 603

页数：15

共 45 条

[41] Community-Wide Health Risk Assessment Using Geographically Resolved Demographic Data: A Synthetic Population Approach
Levy, Jonathan I.
Fabian, Maria Patricia
Peters, Junenette L.
PLOS ONE, 2014, 9 (01):
[42] Enhancing spatial modeling and risk mapping of six air pollutants using synthetic data integration with convolutional neural networks
Bashardoost, Abed
Mesgari, Mohammad Saadi
Karimi, Mina
FRONTIERS IN ENVIRONMENTAL SCIENCE, 2024, 12
[43] Where, why, and how is bias learned in medical image analysis models? A study of bias encoding within convolutional networks using synthetic data
Stanley, Emma A. M.
Souza, Raissa
Wilms, Matthias
Forkert, Nils D.
EBIOMEDICINE, 2025, 111
[44] Does explanatory language convey the auditor's perceived audit risk? A study using a novel big data analysis metric
Choi, Seung Uk
Na, Hyung Jong
Lee, Kun Chang
MANAGERIAL AUDITING JOURNAL, 2023, 38 (06) : 783 - 812
[45] Improving mixed-integer temporal modeling by generating synthetic data using conditional generative adversarial networks: A case study of fluid overload prediction in the intensive care unit
Rafiei, Alireza
Rad, Milad Ghiasi
Sikora, Andrea
Kamaleswaran, Rishikesan
COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 168

← 1 2 3 4 5 →