Generating partially synthetic geocoded public use data with decreased disclosure risk by using differential smoothing

被引:8
|
作者
Quick, Harrison [1 ]
Holan, Scott H. [2 ,3 ]
Wikle, Christopher K. [2 ]
机构
[1] Drexel Univ, Philadelphia, PA 19104 USA
[2] Univ Missouri, Columbia, MO 65211 USA
[3] US Census Bur, Washington, DC USA
基金
美国国家科学基金会;
关键词
Bayesian methods; Data privacy; Multiple imputation; Spatial modelling; Synthetic data; MULTIPLE IMPUTATION;
D O I
10.1111/rssa.12360
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
When collecting geocoded confidential data with the intent to disseminate, agencies often resort to altering the geographies before making data publicly available. An alternative to releasing aggregated and/or perturbed data is to release synthetic data, where sensitive values are replaced with draws from models designed to capture distributional features in the data collected. The issues associated with spatially outlying observations in the data, however, have received relatively little attention. Our goal here is to shed light on this problem, to propose a solution-referred to as 'differential smoothing'-and to illustrate our approach by using sale prices of homes in San Francisco.
引用
收藏
页码:649 / 661
页数:13
相关论文
共 7 条
  • [1] Disclosure Risk and Data Utility for Partially Synthetic Data: An Empirical Study Using the German IAB Establishment Survey
    Drechsler, Joerg
    Reiter, J. P.
    JOURNAL OF OFFICIAL STATISTICS, 2009, 25 (04) : 589 - 603
  • [2] A comparison of synthetic data approaches using utility and disclosure risk measures
    An, Seongbin
    Doan, Trang
    Lee, Juhee
    Kim, Jiwoo
    Kim, Yong Jae
    Kim, Yunji
    Yoon, Changwon
    Jung, Sungkyu
    Kim, Dongha
    Kwon, Sunghoon
    Kim, Hang J.
    Ahn, Jeongyou
    Park, Cheolwo
    KOREAN JOURNAL OF APPLIED STATISTICS, 2023, 36 (02) : 141 - 166
  • [3] Disclosure control using partially synthetic data for large-scale health surveys, with applications to CanCORS
    Loong, Bronwyn
    Zaslavsky, Alan M.
    He, Yulei
    Harrington, David P.
    STATISTICS IN MEDICINE, 2013, 32 (24) : 4139 - 4161
  • [4] Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy
    Sun, Chang
    van Soest, Johan
    Dumontier, Michel
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 143
  • [5] Generating synthetic data to produce public-use microdata for small geographic areas based on complex sample survey data with application to the National Health Interview Survey
    Sakshaug, Joseph W.
    Raghunathan, Trivellore E.
    JOURNAL OF APPLIED STATISTICS, 2014, 41 (10) : 2103 - 2122
  • [6] Bayesian marked point process modeling for generating fully synthetic public use data with point-referenced geography
    Quick, Harrison
    Holan, Scott H.
    Wikle, Christopher K.
    Reiter, Jerome P.
    SPATIAL STATISTICS, 2015, 14 : 439 - 451
  • [7] RoD: Evaluating the Risk of Data Disclosure Using Noise Estimation for Differential Privacy
    Tsou, Yao-Tung
    Chen, Hung-Li
    Chen, Jia-Yang
    IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (01) : 214 - 226