Making statistical inferences about linkage errors

被引:1
|
作者
Dasylva, Abel [1 ]
Goussanou, Arthur [1 ]
机构
[1] Stat Canada, Methodol Branch, 100 Tunneys Pasture Driveway, Ottawa, ON K1A0T6, Canada
关键词
Data integration; Non-sampling error; Probabilistic record linkage; Massive data sets; PROBABILISTIC RECORD LINKAGE; MODELS;
D O I
10.1007/s42081-023-00228-9
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Record linkage aims to identify records that are from the same unit, in one or many sources. Sometimes, it is imperfect because the available identifying information is limited and erroneous. In such cases, it is important to report the linkage accuracy, which may be measured according to one of many proposed statistical models. These models offer clear advantages over clerical reviews, in terms of costs and timeliness. They also apply where clerical reviews are impossible, e.g., when two parties need to link their respective data sets, such that neither party can see the record pairs in the clear. For obvious reasons, these models must be validated before they are used, by performing goodness-to-fit tests. Unfortunately, this is currently difficult because all existing models rely on observations that are correlated. Thus, the Chi-squared and likelihood ratio tests are biased. In fact, it is challenging to perform any kind of statistical inference about these models or their parameters. In this work, this long-standing problem is addressed when modeling the linkage errors through the number of links of a record. The proposed solution bases the inferences on a subset of observations that are approximately independent.
引用
收藏
页码:17 / 56
页数:40
相关论文
共 50 条
  • [31] Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
    Christopher T. Rentsch
    Katie Harron
    Mark Urassa
    Jim Todd
    Georges Reniers
    Basia Zaba
    BMC Medical Research Methodology, 18
  • [32] Bayesian multiscale smoothing for making inferences about features in scatterplots
    Erästö, P
    Holmström, L
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2005, 14 (03) : 569 - 589
  • [33] Improved methods for making inferences about multiple skipped correlations
    Wilcox, Rand R.
    Rousselet, Guillaume A.
    Pernet, Cyril R.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (16) : 3116 - 3131
  • [34] USE OF CATEGORICAL AND INDIVIDUATING INFORMATION IN MAKING INFERENCES ABOUT PERSONALITY
    KRUEGER, J
    ROTHBART, M
    JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 1988, 55 (02) : 187 - 195
  • [35] Making Inferences About Teacher Observation Scores Over Time
    Briggs, Derek C.
    Alzen, Jessica L.
    EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 2019, 79 (04) : 636 - 664
  • [36] Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
    Rentsch, Christopher T.
    Harron, Katie
    Urassa, Mark
    Todd, Jim
    Reniers, Georges
    Zaba, Basia
    BMC MEDICAL RESEARCH METHODOLOGY, 2018, 18
  • [37] MAKING INFERENCES ABOUT MISSING INFORMATION - THE EFFECTS OF EXISTING INFORMATION
    ROSS, WT
    CREYER, EH
    JOURNAL OF CONSUMER RESEARCH, 1992, 19 (01) : 14 - 25
  • [38] Argumentation and use of evidence: making inferences about a sequence of footprints
    Blanco Anaya, Paloma
    Diaz de Bustamante, Joaquin
    ENSENANZA DE LAS CIENCIAS, 2014, 32 (02): : 35 - 52
  • [39] Students' emergent articulations of uncertainty while making informal statistical inferences
    Ben-Zvi, Dani
    Aridor, Keren
    Makar, Katie
    Bakker, Arthur
    ZDM-MATHEMATICS EDUCATION, 2012, 44 (07): : 913 - 925
  • [40] Students’ emergent articulations of uncertainty while making informal statistical inferences
    Dani Ben-Zvi
    Keren Aridor
    Katie Makar
    Arthur Bakker
    ZDM, 2012, 44 (7): : 913 - 925