Zeros and ones: a case for suppressing zeros in sensitive count data with an application to stroke mortality

被引:8
|
作者
Quick, Harrison [1 ]
Holan, Scott H. [2 ]
Wikle, Christopher K. [2 ]
机构
[1] Ctr Dis Control & Prevent, Div Heart Dis & Stroke Prevent, Atlanta, GA 30329 USA
[2] Univ Missouri, Dept Stat, Columbia, MO 65211 USA
来源
STAT | 2015年 / 4卷 / 01期
基金
美国国家科学基金会;
关键词
Bayesian methods; data privacy; disclosure limitation; spatial data analysis; synthetic data;
D O I
10.1002/sta4.92
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In the current era of global internet connectivity, privacy concerns are of the utmost importance. When official statistical agencies collect spatially referenced, confidential data that they intend to release as public-use files, the suppression of small counts is a common measure that agencies take to protect the confidentiality of the data-subjects from ill-intentioned users. The goal of this paper is to demonstrate that an interval suppression criterion that does not suppress zeros can fail to protect regions with a single occurrence. We illustrate the difference in disclosure risk between an interval suppression criterion and a one-sided suppression criterion by considering a US county-level dataset composed of the number of deaths due to stroke in White men. Here, we illustrate that an interval suppression criterion leads to a twofold increase in the disclosure risk when compared with a one-sided suppression criterion for regions with a single incidence among a population of less than 600. We conclude with an extension of these findings beyond stroke mortality and by offering general guidelines for data suppression. Copyright (C) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:227 / 234
页数:8
相关论文
共 1 条