A system to build distributed multivariate models and manage disparate data sharing policies: implementation in the scalable national network for effectiveness research

被引:9
作者
Meeker, Daniella [1 ,4 ]
Jiang, Xiaoqian [2 ]
Matheny, Michael E. [3 ]
Farcas, Claudiu [2 ]
D'Arcy, Michel [4 ]
Pearlman, Laura [4 ]
Nookala, Lavanya [3 ]
Day, Michele E. [2 ]
Kim, Katherine K. [5 ,6 ]
Kim, Hyeoneui [2 ]
Boxwala, Aziz [2 ]
El-Kareh, Robert [2 ]
Kuo, Grace M. [7 ]
Resnic, Frederic S. [8 ]
Kesselman, Carl [4 ]
Ohno-Machado, Lucila [2 ]
机构
[1] Univ So Calif, Dept Prevent Med, 1450 Biggy St,Bldg 288, Los Angeles, CA 90033 USA
[2] Univ Calif San Diego, Dept Biomed Informat, La Jolla, CA 92093 USA
[3] Geriatr Res Educ & Clin Care Serv, New York, NY USA
[4] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
[5] Univ Calif Davis, Dept Pathol & Lab Med, Sacramento, CA 95817 USA
[6] Univ Calif Davis, Dept Internal Med, Sacramento, CA 95817 USA
[7] Univ Calif San Diego, Skaggs Sch Pharm & Pharmaceut Sci, La Jolla, CA 92093 USA
[8] Lahey Hosp & Med Ctr, Burlington, MA USA
关键词
distributed analytics; federated research network; privacy-preserving network infrastructure; comparative effectiveness research; LEARNING HEALTH SYSTEM; INFORMATICS; INFRASTRUCTURE; PCORNET; FOOD;
D O I
10.1093/jamia/ocv017
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Background Centralized and federated models for sharing data in research networks currently exist. To build multivariate data analysis for centralized networks, transfer of patient-level data to a central computation resource is necessary. The authors implemented distributed multivariate models for federated networks in which patient-level data is kept at each site and data exchange policies are managed in a study-centric manner. Objective The objective was to implement infrastructure that supports the functionality of some existing research networks (e.g., cohort discovery, workflow management, and estimation of multivariate analytic models on centralized data) while adding additional important new features, such as algorithms for distributed iterative multivariate models, a graphical interface for multivariate model specification, synchronous and asynchronous response to network queries, investigator-initiated studies, and study-based control of staff, protocols, and data sharing policies. Materials and Methods Based on the requirements gathered from statisticians, administrators, and investigators from multiple institutions, the authors developed infrastructure and tools to support multisite comparative effectiveness studies using web services for multivariate statistical estimation in the SCANNER federated network. Results The authors implemented massively parallel (map-reduce) computation methods and a new policy management system to enable each study initiated by network participants to define the ways in which data may be processed, managed, queried, and shared. The authors illustrated the use of these systems among institutions with highly different policies and operating under different state laws. Discussion and Conclusion Federated research networks need not limit distributed query functionality to count queries, cohort discovery, or independently estimated analytic models. Multivariate analyses can be efficiently and securely conducted without patient-level data transport, allowing institutions with strict local data storage requirements to participate in sophisticated analyses based on federated research networks.
引用
收藏
页码:1187 / 1195
页数:9
相关论文
共 51 条
[1]   A review of parallel processing for statistical computation [J].
Adams, NM ;
Kirby, SPJ ;
Harris, P ;
Clegg, DB .
STATISTICS AND COMPUTING, 1996, 6 (01) :37-49
[2]  
Ames MJ, AMIA 2013 SUMM CLIN
[3]   PaTH: towards a learning health system in the Mid-Atlantic region [J].
Amin, Waqas ;
Tsui, Fuchiang ;
Borromeo, Charles ;
Chuang, Cynthia H. ;
Espino, Jeremy U. ;
Ford, Daniel ;
Hwang, Wenke ;
Kapoor, Wishwa ;
Lehmann, Harold ;
Martich, G. Daniel ;
Morton, Sally ;
Paranjape, Anuradha ;
Shirey, William ;
Sorensen, Aaron ;
Becich, Michael J. ;
Hess, Rachel .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (04) :633-636
[4]  
[Anonymous], 2010, USENIX HOTCLOUD
[5]  
[Anonymous], ARXIV14074908
[6]  
[Anonymous], 2007, NIPS
[7]   Near real-time adverse drug reaction surveillance within population-based health networks: methodology considerations for data accrual [J].
Avery, Taliser R. ;
Kulldorff, Martin ;
Vilk, Yury ;
Li, Lingling ;
Cheetham, T. Craig ;
Dublin, Sascha ;
Davis, Robert L. ;
Liu, Liyan ;
Herrinton, Lisa ;
Brown, Jeffrey S. .
PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2013, 22 (05) :488-495
[8]  
Behilng D., QUALITY DATA MODEL B
[9]   Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks [J].
Brown, Jeffrey S. ;
Kahn, Michael ;
Toh, Sengwee .
MEDICAL CARE, 2013, 51 (08) :S22-S29
[10]   Distributed Health Data Networks A Practical and Preferred Approach to Multi-Institutional Evaluations of Comparative Effectiveness, Safety, and Quality of Care [J].
Brown, Jeffrey S. ;
Holmes, John H. ;
Shah, Kiran ;
Hall, Ken ;
Lazarus, Ross ;
Platt, Richard .
MEDICAL CARE, 2010, 48 (06) :S45-S51