Improved secondary analysis of linked data: a framework and an illustration

被引:16
作者
Chambers, Ray [1 ]
da Silva, Andrea Diniz [2 ,3 ]
机构
[1] Natl Inst Appl Stat Res Australia, Wollongong, NSW, Australia
[2] Inst Brasileiro Geog & Estat, Rio De Janeiro, Brazil
[3] Escola Nacl Ciencias Estat, Rio De Janeiro, Brazil
关键词
Bias correction; Classification analysis; Linkage error; Paradata; Probability linkage; RECORD-LINKAGE; REGRESSION-ANALYSIS; POPULATION;
D O I
10.1111/rssa.12477
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Applications that use linked data are now part of mainstream social science research, though they generally do not take linkage error into consideration. Solutions that correct for the bias caused by these errors have been proposed but are not yet embedded in the various analysis procedures in common use. Secondary analyses based on linked data can therefore be potentially misleading. We review some recent approaches to non-deterministic data linkage together with a framework for secondary analysis of the linked data which makes use of para-data produced by the linkage process to correct this bias. We also describe a new method for secondary analysis of linked data that builds on this framework and show how it can be used for estimation of a set of domain means based on linked data. We then illustrate this approach via an empirical study based on record linkage of agricultural producers in four states of Brazil aimed at producing estimates of agricultural output by industry. Our study considers registerto-register linkage as well as sample-to-register linkage, and we show results for the traditional Fellegi-Sunter approach to record linkage as well as for a newer linkage procedure based on the use of classification trees and bagging.
引用
收藏
页码:37 / 59
页数:23
相关论文
共 58 条
  • [51] UK Administrative Data Research Network, 2012, IMPR ACC RES POL
  • [52] Vapnik N V., 1998, Statistical Learning Theory
  • [53] Wickham H., 2017, PACKAGE DPLYR GRAMMA
  • [54] Wickham H., 2018, STRINGR SIMPLE CONSI
  • [55] Winkler W.E., 1990, String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage, P354, DOI DOI 10.1007/978-1-4612-2856-1_101
  • [56] Matching and record linkage
    Winkler, William E.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2014, 6 (05): : 313 - 325
  • [57] PEBL: Web page classification without negative examples
    Yu, HJ
    Han, JW
    Chang, KCC
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (01) : 70 - 81
  • [58] Yu P. S, 2003, 3 INT C DAT MIN MELB