Distributed learning on 20 000+lung cancer patients - The Personal Health Train

被引:89
作者
Deist, Timo M. [1 ,2 ]
Dankers, Frank J. W. M. [1 ,3 ]
Ojha, Priyanka [4 ]
Marshall, M. Scott [4 ]
Janssen, Tomas [4 ]
Faivre-Finn, Corinne [5 ]
Masciocchi, Carlotta [7 ]
Valentini, Vincenzo [6 ,7 ]
Wang, Jiazhou [8 ]
Chen, Jiayan [8 ]
Zhang, Zhen [8 ]
Spezi, Emiliano [9 ,10 ]
Button, Mick [10 ]
Nuyttens, Joost Jan [1 ,11 ]
Vernhout, Rene [11 ]
van Soest, Johan
Jochems, Arthur [2 ]
Monshouwer, Rene [3 ]
Bussink, Johan [3 ]
Price, Gareth [5 ]
Lambin, Philippe [2 ]
Dekker, Andre [1 ]
机构
[1] Maastricht Univ Med Ctr, GROW Sch Oncol & Dev Biol, Dept Radiat Oncol MAASTRO, Maastricht, Netherlands
[2] Maastricht Univ Med Ctr, GROW Sch Oncol & Dev Biol, D Lab Dept Precis Med, Maastricht, Netherlands
[3] Radboud Univ Nijmegen, Med Ctr, Dept Radiat Oncol, Nijmegen, Netherlands
[4] Netherlands Canc Inst Antoni van Leeuwenhoek, Dept Radiat Oncol, Amsterdam, Netherlands
[5] Univ Manchester, Manchester Acad Hlth Sci Ctr, Christie NHS Fdn Trust, Manchester, Lancs, England
[6] Univ Cattolica Sacro Cuore, Milan, Italy
[7] Fdn Policlin Univ A Gemelli IRCCS, Rome, Italy
[8] Fudan Univ, Shanghai Canc Ctr, Dept Radiat Oncol, Dept Oncol,Shanghai Med Coll, Shanghai, Peoples R China
[9] Cardiff Univ, Sch Engn, Cardiff, Wales
[10] Velindre Canc Ctr, Cardiff, Wales
[11] Erasmus MC, Canc Inst, Dept Radiat Oncol, Rotterdam, Netherlands
基金
欧盟地平线“2020”;
关键词
Lung cancer; Big data; Distributed learning; Federated learning; Machine learning; Survival analysis; Prediction modeling; FAIR data; CARE;
D O I
10.1016/j.radonc.2019.11.019
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Background and purpose: Access to healthcare data is indispensable for scientific progress and innovation. Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns. The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR (Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and machine learning. Patient data never leaves a healthcare institute. Materials and methods: Lung cancer patient-specific databases (tumor staging and post-treatment survival information) of oncology departments were translated according to a FAIR data model and stored locally in a graph database. Software was installed locally to enable deployment of distributed machine learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available) are patient privacy-preserving as only summary statistics and regression coefficients are exchanged with the central server. A logistic regression model to predict post-treatment two-year survival was trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction error (RMSE) and calibration plots. Results: In 4 months, we connected databases with 23 203 patient cases across 8 healthcare institutes in 5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using the PHT. Summary statistics were computed across databases. A distributed logistic regression model predicting post-treatment two-year survival was trained on 14 810 patients treated between 1978 and 2011 and validated on 8 393 patients treated between 2012 and 2015. Conclusion: The PHT infrastructure demonstrably overcomes patient privacy barriers to healthcare data sharing and enables fast data analyses across multiple institutes from different countries with different regulatory regimens. This infrastructure promotes global evidence-based medicine while prioritizing patient privacy. (C) 2019 The Authors. Published by Elsevier B.V.
引用
收藏
页码:189 / 200
页数:12
相关论文
共 20 条
  • [1] Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach
    Aerts, Hugo J. W. L.
    Velazquez, Emmanuel Rios
    Leijenaar, Ralph T. H.
    Parmar, Chintan
    Grossmann, Patrick
    Cavalho, Sara
    Bussink, Johan
    Monshouwer, Rene
    Haibe-Kains, Benjamin
    Rietveld, Derek
    Hoebers, Frank
    Rietbergen, Michelle M.
    Leemans, C. Rene
    Dekker, Andre
    Quackenbush, John
    Gillies, Robert J.
    Lambin, Philippe
    [J]. NATURE COMMUNICATIONS, 2014, 5
  • [2] Last millennium Northern Hemisphere summer temperatures from tree rings: Part II, spatially resolved reconstructions
    Anchukaitis, Kevin J.
    Wilson, Rob
    Briffa, Keith R.
    Buntgen, Ulf
    Cook, Edward R.
    D'Arrigo, Rosanne
    Davi, Nicole
    Esper, Jan
    Frank, David
    Gunnarson, Bjorn E.
    Hegerl, Gabi
    Helama, Samuli
    Klesse, Stefan
    Krusic, Paul J.
    Linderholm, Hans W.
    Myglan, Vladimir
    Osborn, Timothy J.
    Zhang, Peng
    Rydval, Milos
    Schneider, Lea
    Schurer, Andrew
    Wiles, Greg
    Zorita, Eduardo
    [J]. QUATERNARY SCIENCE REVIEWS, 2017, 163 : 1 - 22
  • [3] [Anonymous], 20K DISTRI LEARN CHA
  • [4] Distributed optimization and statistical learning via the alternating direction method of multipliers
    Boyd S.
    Parikh N.
    Chu E.
    Peleato B.
    Eckstein J.
    [J]. Foundations and Trends in Machine Learning, 2010, 3 (01): : 1 - 122
  • [5] Collins GS, 2015, J CLIN EPIDEMIOL, V68, P112, DOI [10.1038/bjc.2014.639, 10.7326/M14-0697, 10.1016/j.eururo.2014.11.025, 10.1016/j.jclinepi.2014.11.010, 10.1136/bmj.g7594, 10.1002/bjs.9736, 10.1186/s12916-014-0241-z, 10.7326/M14-0698]
  • [6] Deist TM, 2017, CLIN TRANSL RAD ONCO, V4, P24, DOI 10.1016/j.ctro.2016.12.004
  • [7] Deist TM, 2018, CODE DISTRIBUTED LEA
  • [8] The American Joint Committee on Cancer: the 7th Edition of the AJCC Cancer Staging Manual and the Future of TNM
    Edge, Stephen B.
    Compton, Carolyn C.
    [J]. ANNALS OF SURGICAL ONCOLOGY, 2010, 17 (06) : 1471 - 1474
  • [9] DataSHIELD: taking the analysis to the data, not the data to the analysis
    Gaye, Amadou
    Marcon, Yannick
    Isaeva, Julia
    LaFlamme, Philippe
    Turner, Andrew
    Jones, Elinor M.
    Minion, Joel
    Boyd, Andrew W.
    Newby, Christopher J.
    Nuotio, Marja-Liisa
    Wilson, Rebecca
    Butters, Oliver
    Murtagh, Barnaby
    Demir, Ipek
    Doiron, Dany
    Giepmans, Lisette
    Wallace, Susan E.
    Budin-Ljosne, Isabelle
    Schmidt, Carsten Oliver
    Boffetta, Paolo
    Boniol, Mathieu
    Bota, Maria
    Carter, Kim W.
    deKlerk, Nick
    Dibben, Chris
    Francis, Richard W.
    Hiekkalinna, Tero
    Hveem, Kristian
    Kvaloy, Kirsti
    Millar, Sean
    Perry, Ivan J.
    Peters, Annette
    Phillips, Catherine M.
    Popham, Frank
    Raab, Gillian
    Reischl, Eva
    Sheehan, Nuala
    Waldenberger, Melanie
    Perola, Markus
    van den Heuvel, Edwin
    Macleod, John
    Knoppers, Bartha M.
    Stolk, Ronald P.
    Fortier, Isabel
    Harris, Jennifer R.
    Woffenbuttel, Bruce H. R.
    Murtagh, Madeleine J.
    Ferretti, Vincent
    Burton, Paul R.
    [J]. INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2014, 43 (06) : 1929 - 1944
  • [10] Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers
    Hripcsak, George
    Duke, Jon D.
    Shah, Nigam H.
    Reich, Christian G.
    Huser, Vojtech
    Schuemie, Martijn J.
    Suchard, Marc A.
    Park, Rae Woong
    Wong, Ian Chi Kei
    Rijnbeek, Peter R.
    van der Lei, Johan
    Pratt, Nicole
    Noren, G. Niklas
    Li, Yu-Chuan
    Stang, Paul E.
    Madigan, David
    Ryan, Patrick B.
    [J]. MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 574 - 578