Improved secondary analysis of linked data: a framework and an illustration

被引:16
作者
Chambers, Ray [1 ]
da Silva, Andrea Diniz [2 ,3 ]
机构
[1] Natl Inst Appl Stat Res Australia, Wollongong, NSW, Australia
[2] Inst Brasileiro Geog & Estat, Rio De Janeiro, Brazil
[3] Escola Nacl Ciencias Estat, Rio De Janeiro, Brazil
关键词
Bias correction; Classification analysis; Linkage error; Paradata; Probability linkage; RECORD-LINKAGE; REGRESSION-ANALYSIS; POPULATION;
D O I
10.1111/rssa.12477
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Applications that use linked data are now part of mainstream social science research, though they generally do not take linkage error into consideration. Solutions that correct for the bias caused by these errors have been proposed but are not yet embedded in the various analysis procedures in common use. Secondary analyses based on linked data can therefore be potentially misleading. We review some recent approaches to non-deterministic data linkage together with a framework for secondary analysis of the linked data which makes use of para-data produced by the linkage process to correct this bias. We also describe a new method for secondary analysis of linked data that builds on this framework and show how it can be used for estimation of a set of domain means based on linked data. We then illustrate this approach via an empirical study based on record linkage of agricultural producers in four states of Brazil aimed at producing estimates of agricultural output by industry. Our study considers registerto-register linkage as well as sample-to-register linkage, and we show results for the traditional Fellegi-Sunter approach to record linkage as well as for a newer linkage procedure based on the use of classification trees and bagging.
引用
收藏
页码:37 / 59
页数:23
相关论文
共 58 条