Heterogeneity-aware and communication-efficient distributed statistical inference

被引:47
作者
Duan, Rui [1 ]
Ning, Yang [2 ]
Chen, Yong [3 ]
机构
[1] Harvard Univ, Dept Biostat, 677 Huntington Ave, Boston, MA 02115 USA
[2] Cornell Univ, Dept Stat & Data Sci, Comstock Hall 1188, Ithaca, NY 14853 USA
[3] Univ Penn, Dept Biostat Epidemiol & Informat, 423 Guardian Dr, Philadelphia, PA 19104 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Data integration; Distributed inference; Efficient score; Surrogate likelihood; Two-index asymptotics; METAANALYSIS; PRIVACY;
D O I
10.1093/biomet/asab007
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In multicentre research, individual-level data are often protected against sharing across sites. To overcome the barrier of data sharing, many distributed algorithms, which only require sharing aggregated information, have been developed. The existing distributed algorithms usually assume the data are homogeneously distributed across sites. This assumption ignores the important fact that the data collected at different sites may come from various subpopulations and environments, which can lead to heterogeneity in the distribution of the data. Ignoring the heterogeneity may lead to erroneous statistical inference. We propose distributed algorithms which account for the heterogeneous distributions by allowing site-specific nuisance parameters. The proposed methods extend the surrogate likelihood approach (; ) to the heterogeneous setting by applying a novel density ratio tilting method to the efficient score function. The proposed algorithms maintain the same communication cost as existing communication-efficient algorithms. We establish a nonasymptotic risk bound for the proposed distributed estimator and its limiting distribution in the two-index asymptotic setting, which allows both sample size per site and the number of sites to go to infinity. In addition, we show that the asymptotic variance of the estimator attains the Cramer-Rao lower bound when the number of sites is smaller in rate than the sample size at each site. Finally, we use simulation studies and a real data application to demonstrate the validity and feasibility of the proposed methods.
引用
收藏
页码:67 / 83
页数:17
相关论文
共 30 条
[1]  
[Anonymous], 2016, ARXIV161004798
[2]  
[Anonymous], 2017, The Journal of Machine Learning Research
[3]   Privacy, confidentiality, and electronic medical records [J].
Barrows, RC ;
Clayton, PD .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1996, 3 (02) :139-148
[4]   DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS [J].
Battey, Heather ;
Fan, Jianqing ;
Liu, Han ;
Lu, Junwei ;
Zhu, Ziwei .
ANNALS OF STATISTICS, 2018, 46 (03) :1352-1382
[5]   A SPLIT-AND-CONQUER APPROACH FOR ANALYSIS OF EXTRAORDINARILY LARGE DATA [J].
Chen, Xueying ;
Xie, Min-ge .
STATISTICA SINICA, 2014, 24 (04) :1655-1684
[6]   Conducting multicenter research in healthcare simulation: Lessons learned from the INSPIRE network [J].
Adam Cheng ;
David Kessler ;
Ralph Mackinnon ;
Todd P. Chang ;
Vinay M. Nadkarni ;
Elizabeth A. Hunt ;
Jordan Duval-Arnould ;
Yiqun Lin ;
Martin Pusic ;
Marc Auerbach .
Advances in Simulation, 2 (1)
[7]   METAANALYSIS IN CLINICAL-TRIALS [J].
DERSIMONIAN, R ;
LAIRD, N .
CONTROLLED CLINICAL TRIALS, 1986, 7 (03) :177-188
[8]   Learning from local to global: An efficient distributed algorithm for modeling time-to-event data [J].
Duan, Rui ;
Luo, Chongliang ;
Schuemie, Martijn J. ;
Tong, Jiayi ;
Liang, C. Jason ;
Chang, Howard H. ;
Boland, Mary Regina ;
Bian, Jiang ;
Xu, Hua ;
Holmes, John H. ;
Forrest, Christopher B. ;
Morton, Sally C. ;
Berlin, Jesse A. ;
Moore, Jason H. ;
Mahoney, Kevin B. ;
Chen, Yong .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (07) :1028-1036
[9]   Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm [J].
Duan, Rui ;
Boland, Mary Regina ;
Liu, Zixuan ;
Liu, Yue ;
Chang, Howard H. ;
Xu, Hua ;
Chu, Haitao ;
Schmid, Christopher H. ;
Forrest, Christopher B. ;
Holmes, John H. ;
Schuemie, Martijn J. ;
Berlin, Jesse A. ;
Moore, Jason H. ;
Chen, Yong .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (03) :376-385
[10]  
Duan R, 2019, BIOCOMPUT-PAC SYM, P30