Improved secondary analysis of linked data: a framework and an illustration

被引：16

作者：

Chambers, Ray ^{[1
]}

da Silva, Andrea Diniz ^{[2
,3
]}

机构：

[1] Natl Inst Appl Stat Res Australia, Wollongong, NSW, Australia

[2] Inst Brasileiro Geog & Estat, Rio De Janeiro, Brazil

[3] Escola Nacl Ciencias Estat, Rio De Janeiro, Brazil

来源：

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY | 2020年 / 183卷 / 01期

关键词：

Bias correction; Classification analysis; Linkage error; Paradata; Probability linkage; RECORD-LINKAGE; REGRESSION-ANALYSIS; POPULATION;

D O I：

10.1111/rssa.12477

中图分类号：

O1 [数学]; C [社会科学总论];

学科分类号：

03 ; 0303 ; 0701 ; 070101 ;

摘要：

Applications that use linked data are now part of mainstream social science research, though they generally do not take linkage error into consideration. Solutions that correct for the bias caused by these errors have been proposed but are not yet embedded in the various analysis procedures in common use. Secondary analyses based on linked data can therefore be potentially misleading. We review some recent approaches to non-deterministic data linkage together with a framework for secondary analysis of the linked data which makes use of para-data produced by the linkage process to correct this bias. We also describe a new method for secondary analysis of linked data that builds on this framework and show how it can be used for estimation of a set of domain means based on linked data. We then illustrate this approach via an empirical study based on record linkage of agricultural producers in four states of Brazil aimed at producing estimates of agricultural output by industry. Our study considers registerto-register linkage as well as sample-to-register linkage, and we show results for the traditional Fellegi-Sunter approach to record linkage as well as for a newer linkage procedure based on the use of classification trees and bagging.

引用

页码：37 / 59

页数：23

共 58 条

[31] Instituto Brasileiro de Geografia e Estatistica, 2012, SINT HIST HIST CENS
[32] ADVANCES IN RECORD-LINKAGE METHODOLOGY AS APPLIED TO MATCHING THE 1985 CENSUS OF TAMPA, FLORIDA
JARO, MA
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1989, 84 (406) : 414 - 420
[33] Keim D, 2002, P 8 INT C KNOWL DIS
[34] Kim G., 2013, BIAS REDUCTION CORRE
[35] Unbiased regression estimation under correlated linkage errors
Kim, Gunky
Chambers, Raymond
[J]. STAT, 2015, 4 (01): : 32 - 45
[36] Regression analysis under incomplete linkage
Kim, Gunky
Chambers, Raymond
[J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2012, 56 (09) : 2756 - 2770
[37] Regression Analysis under Probabilistic Multi-Linkage
Kim, Gunky
Chambers, Raymond
[J]. STATISTICA NEERLANDICA, 2012, 66 (01) : 64 - 79
[38] Regression analysis with linked data
Lahiri, P
Larsen, MD
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (469) : 222 - 230
[39] Correlates of record linkage and estimating risks of non-linkage biases in business data sets
Moore, Jamie C.
Smith, Peter W. F.
Durrant, Gabriele B.
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2018, 181 (04) : 1211 - 1230
[40] AUTOMATIC LINKAGE OF VITAL RECORDS
NEWCOMBE, HB
KENNEDY, JM
AXFORD, SJ
JAMES, AP
[J]. SCIENCE, 1959, 130 (3381) : 954 - 959

← 1 2 3 4 5 6 →