Controlled Shuffling, Statistical Confidentiality and Microdata Utility: A Successful Experiment with a 10% Household Sample of the 2011 Population Census of Ireland for the IPUMS-International Database

被引:0
作者
McCaa, Robert [1 ]
Muralidhar, Krishnamurty [1 ]
Sarathy, Rathindra [1 ]
Comerford, Michael [1 ]
Esteve-Palos, Albert [1 ]
机构
[1] Minnesota Populat Ctr, Minneapolis, MN 55455 USA
来源
PRIVACY IN STATISTICAL DATABASES, PSD 2014 | 2014年 / 8744卷
关键词
controlled shuffling; population census; microdata sample; data privacy; data utility; statistical disclosure controls; IPUMS-International; Ireland; DISCLOSURE CONTROL;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
IPUMS-International disseminates more than two hundred-fifty integrated, confidentialized census microdata samples to thousands of researchers world-wide at no cost. The number of samples is increasing at the rate of several dozen per year, as quickly as the task of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented in the microdata is a sine qua non of the IPUMS project. For the 2010 round of censuses, even greater protections are required, while researchers are demanding ever higher precision and utility. This paper describes a tripartite collaborative experiment using a ten percent household sample of the 2011 census of Ireland to estimate risk, mask the microdata using controlled shuffling, and assess analytical utility by comparing the masked data against the unprotected source microdata. Controlled shuffling exploits hierarchically ordered coding schemes to protect privacy and enhance utility. With controlled shuffling, the lesson seems to be the more detail means less risk and greater utility. Overall, despite substantial perturbation of the masked dataset (30% of adults on one or more characteristic), we find that data utility is very high and information loss is slight, even for fairly complex analytical problems.
引用
收藏
页码:326 / 337
页数:12
相关论文
共 13 条
[1]  
[Anonymous], 2008, INT CLASS DIS
[2]  
Cleveland L., 2012, LNCS, V7556, P179
[3]   DATA-SWAPPING - A TECHNIQUE FOR DISCLOSURE CONTROL [J].
DALENIUS, T ;
REISS, SP .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1982, 6 (01) :73-85
[4]   Practical data-oriented microaggregation for statistical disclosure control [J].
Domingo-Ferrer, J ;
Mateo-Sanz, JM .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (01) :189-201
[5]  
Domingo-Ferrer J., 2012, LNCS, V7556, P90
[6]   A critique of k-anonymity and some of its enhancements [J].
Domingo-Ferrer, Josep ;
Torra, Vicenc .
ARES 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON AVAILABILITY, SECURITY AND RELIABILITY, 2008, :990-+
[7]  
Elliot M.J., 1999, NETHERLANDS OFFICIAL, V14, P6
[8]  
Elliot M, 2010, LECT NOTES COMPUT SC, V6344, P138, DOI 10.1007/978-3-642-15838-4_13
[9]  
Hundepool A., 2012, WILEY SERIES SURVEY
[10]   Why swap when you can shuffle? A comparison of the proximity swap and data shuffle for numeric data [J].
Muralidhar, Krish ;
Sarathy, Rathindra ;
Dandekar, Ramesh .
PRIVACY IN STATISTICAL DATABASES, PROCEEDINGS, 2006, 4302 :164-+