The need for multimodal health data modeling: A practical approach for a federated-learning healthcare platform

被引:25
作者
Cremonesi, Francesco [1 ,8 ]
Planat, Vincent [2 ]
Kalokyri, Varvara [3 ]
Kondylakis, Haridimos [3 ]
Sanavia, Tiziana [4 ]
Resinas, Victor Miguel Mateos [5 ]
Singh, Babita [6 ]
Uribe, Silvia [7 ]
机构
[1] Univ Cote dAzur, Epione Res Project, Inria Sophia Antipolis Mediteranee, Nice, France
[2] Dedalus, Global Consulting, Le Plessis Robinson, France
[3] Fdn Res & Technol Hellas, Inst Comp Sci, Iraklion, Greece
[4] Univ Torino, Dept Med Sci, Turin, Italy
[5] Dedalus Healthcare, Malaga, Spain
[6] Barcelona Inst Sci & Technol, Ctr Genom Regulat CRG, Barcelona, Spain
[7] Univ Politecn Madrid, Escuela Tecn Super Ingn Sistemas Informat, Madrid, Spain
[8] Datawizard srl, Rome, Italy
关键词
Federated learning; Data model; Healthcare; Medical research; Omics; Lessons learned; MEDICAL DATA; ARCHITECTURE; INFORMATICS; PRIVACY;
D O I
10.1016/j.jbi.2023.104338
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Federated learning initiatives in healthcare are being developed to collaboratively train predictive models without the need to centralize sensitive personal data. GenoMed4All is one such project, with the goal of connecting European clinical and -omics data repositories on rare diseases through a federated learning platform. Currently, the consortium faces the challenge of a lack of well-established international datasets and interoperability standards for federated learning applications on rare diseases. This paper presents our practical approach to select and implement a Common Data Model (CDM) suitable for the federated training of predictive models applied to the medical domain, during the initial design phase of our federated learning platform. We describe our selection process, composed of identifying the consortium's needs, reviewing our functional and technical architecture specifications, and extracting a list of business requirements. We review the state of the art and evaluate three widely-used approaches (FHIR, OMOP and Phenopackets) based on a checklist of requirements and specifications. We discuss the pros and cons of each approach considering the use cases specific to our consortium as well as the generic issues of implementing a European federated learning healthcare platform. A list of lessons learned from the experience in our consortium is discussed, from the importance of establishing the proper communication channels for all stakeholders to technical aspects related to -omics data. For federated learning projects focused on secondary use of health data for predictive modeling, encompassing multiple data modalities, a phase of data model convergence is sorely needed to gather different data representations developed in the context of medical research, interoperability of clinical care software, imaging, and -omics analysis into a coherent, unified data model. Our work identifies this need and presents our experience and a list of actionable lessons learned for future work in this direction.
引用
收藏
页数:12
相关论文
共 47 条
[1]   OMOP CDM Can Facilitate Data-Driven Studies for Cancer Prediction: A Systematic Review [J].
Ahmadi, Najia ;
Peng, Yuan ;
Wolfien, Markus ;
Zoch, Michele ;
Sedlmayr, Martin .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (19)
[2]  
Alterovitz G., 2017, SYNC GENES REPORT OF
[3]   Federated Learning for Healthcare: Systematic Review and Architecture Proposal [J].
Antunes, Rodolfo Stoffel ;
da Costa, Cristiano Andre ;
Kuederle, Arne ;
Yari, Imrana Abdullahi ;
Eskofier, Bjoern .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2022, 13 (04)
[4]  
Bachman Donald, 2010, CURR CONTENTS, V8, P207
[5]   Distributed Analytics on Sensitive Medical Data: The Personal Health Train [J].
Beyan, Oya ;
Choudhury, Ananya ;
van Soest, Johan ;
Kohlbacher, Oliver ;
Zimmermann, Lukas ;
Stenzhorn, Holger ;
Karim, Md Rezaul ;
Dumontier, Michel ;
Decker, Stefan ;
Santos, Luiz Olavo Bonino da Silva ;
Dekker, Andre .
DATA INTELLIGENCE, 2020, 2 (1-2) :96-107
[6]  
Bittner K., 2003, USE CASE MODELING
[7]  
Choudhury A, 2020, Machine Learning, Image Processing, Network Security and Data Sciences
[8]   A systematic review of federated learning applications for biomedical data [J].
Crowson, Matthew G. ;
Moukheiber, Dana ;
Arevalo, Aldo Robles ;
Lam, Barbara D. ;
Mantena, Sreekar ;
Rana, Aakanksha ;
Goss, Deborah ;
Bates, David W. ;
Celi, Leo Anthony .
PLOS DIGITAL HEALTH, 2022, 1 (05)
[9]   Design considerations, architecture, and use of the Mini-Sentinel distributed data system [J].
Curtis, Lesley H. ;
Weiner, Mark G. ;
Boudreau, Denise M. ;
Cooper, William O. ;
Daniel, Gregory W. ;
Nair, Vinit P. ;
Raebel, Marsha A. ;
Beaulieu, Nicolas U. ;
Rosofsky, Robert ;
Woodworth, Tiffany S. ;
Brown, Jeffrey S. .
PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2012, 21 :23-31
[10]   Distributed Learning to Protect Privacy in Multi-centric Clinical Studies [J].
Damiani, Andrea ;
Vallati, Mauro ;
Gatta, Roberto ;
Dinapoli, Nicola ;
Jochems, Arthur ;
Deist, Timo ;
van Soest, Johan ;
Dekker, Andre ;
Valentini, Vincenzo .
ARTIFICIAL INTELLIGENCE IN MEDICINE (AIME 2015), 2015, 9105 :65-75