Balancing data privacy and usability in the federal statistical system

被引:25
作者
Hotz, V. Joseph [1 ]
Bollinger, Christopher R. [2 ]
Komarova, Tatiana [3 ]
Manski, Charles F. [4 ]
Moffitt, Robert A. [5 ]
Nekipelov, Denis [6 ]
Sojourner, Aaron [7 ]
Spencer, Bruce D. [8 ]
机构
[1] Duke Univ, Dept Econ, Durham, NC 27708 USA
[2] Univ Kentucky, Dept Econ, Lexington, KY 40503 USA
[3] London Sch Econ & Polit Sci, London WC2A 3PH, England
[4] Northwestern Univ, Dept Econ, Evanston, IL 60208 USA
[5] Johns Hopkins Univ, Dept Econ, Baltimore, MD 21211 USA
[6] Univ Virginia, Dept Econ, Charlottesville, VA 22904 USA
[7] WE Upjohn Inst Employment Policy, Kalamazoo, MI 49007 USA
[8] Northwestern Univ, Dept Stat & Data Sci, Evanston, IL 60208 USA
关键词
DIFFERENTIAL PRIVACY; ECONOMIC-ANALYSIS; DISCLOSURE; CONFIDENTIALITY; CENSUS; QUALITY; RISK;
D O I
10.1073/pnas.2104906119
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other hand, agencies which release public databases face new challenges in protecting the privacy of the subjects in those databases, which leads them to consider releasing fewer data or masking the data in ways that will reduce their accuracy. In this essay, we argue that the discussion has not given proper consideration to the reduced social benefits of data availability and their usability relative to the value of increased levels of privacy protection. A more balanced benefit-cost framework should be used to assess these trade-offs. We express concerns both with synthetic data methods for disclosure limitation, which will reduce the types of research that can be reliably conducted in unknown ways, and with differential privacy criteria that use what we argue is an inappropriate measure of disclosure risk. We recommend that the measure of disclosure risk used to assess all disclosure protection methods focus on what we believe is the risk that individuals should care about, that more study of the impact of differential privacy criteria and synthetic data methods on data usability for research be conducted before either is put into widespread use, and that more research be conducted on alternativemethods of disclosure risk reduction that better balance benefits and costs.
引用
收藏
页数:10
相关论文
共 63 条
[1]  
Abowd J., 2019, Census TopDown: Differentially private data, incremental schemas, and consistency with public knowledge
[2]  
Abowd J. M, PROTECTING CONFIDENT
[3]  
Abowd J. M., 2021, State of Alabama v. U.S. Department of Commerce
[4]   An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices [J].
Abowd, John M. ;
Schmutte, Ian M. .
AMERICAN ECONOMIC REVIEW, 2019, 109 (01) :171-202
[5]  
Abowd JM, 2015, BROOKINGS PAP ECO AC, P221
[6]  
Advisory Committee on Data for Evidence Building, 2021, ADV COMM DAT EV BUIL
[7]  
Anderson M. J., 2009, JPC, V1, P7
[8]   The Census and the Federal Statistical System: Historical Perspectives [J].
Anderson, Margo .
ANNALS OF THE AMERICAN ACADEMY OF POLITICAL AND SOCIAL SCIENCE, 2010, 631 :152-162
[9]  
[Anonymous], 2019, 115 C FDN EV BAS POL
[10]  
[Anonymous], 2017, The promise of evidence-based policymaking: Report of the Commission on Evidence-Based Policymaking