Imputation in data fusion of heterogeneous data sets a model-based numerical experiment

被引:2
作者
Berchtold, Andre [1 ,2 ]
Jeannin, Andre [1 ]
机构
[1] Univ Hosp Ctr, Grp Rech Sante Adolescents, Lausanne, Switzerland
[2] Univ Lausanne, Inst Appl Math, Lausanne, Switzerland
关键词
binary variable; data fusion; data structure; Expectation-Maximization algorithm; logistic regression; matching;
D O I
10.1080/03610910802203295
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Given the very large amount of data obtained everyday through population surveys, much of the new research again could use this information instead of collecting new samples. Unfortunately, relevant data are often disseminated into different files obtained through different sampling designs. Data fusion is a set of methods used to combine information from different sources into a single dataset. In this article, we are interested in a specific problem: the fusion of two data files, one of which being quite small. We propose a model-based procedure combining a logistic regression with an Expectation-Maximization algorithm. Results show that despite the lack of data, this procedure can perform better than standard matching procedures.
引用
收藏
页码:1316 / 1328
页数:13
相关论文
共 13 条
[1]  
[Anonymous], 2001, 200117 US DEP ED NAT
[2]  
[Anonymous], 1996, The EM Algorithm and Extensions
[3]  
AUGURZKY B, 2004, 21 RWI
[4]   Small-sample degrees of freedom with multiple imputation [J].
Barnard, J ;
Rubin, DB .
BIOMETRIKA, 1999, 86 (04) :948-955
[5]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[6]  
GRAF M, 2004, FUSION DONNEES RAPPO
[7]  
GRIFFIN RA, 1994, P SURV RES METH SECT, P485
[8]  
KOVACEVIC MS, 1994, P SECT SURV RES METH, P479
[9]   Data fusion and data grafting [J].
Saporta, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 38 (04) :465-473
[10]   Multiple imputation for multivariate missing-data problems: A data analyst's perspective [J].
Schafer, JL ;
Olsen, MK .
MULTIVARIATE BEHAVIORAL RESEARCH, 1998, 33 (04) :545-571