Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering

被引:21
作者
Tang, Ming [1 ,2 ]
Gao, Chao [1 ,2 ]
Goutman, Stephen A. [3 ]
Kalinin, Alexandr [1 ,4 ]
Mukherjee, Bhramar [2 ]
Guan, Yuanfang [4 ]
Dinov, Ivo D. [1 ,4 ,5 ]
机构
[1] Univ Michigan, Stat Online Computat Resource, Dept Hlth Behav & Biol Sci, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Dept Biostat, Ann Arbor, MI 48109 USA
[3] Univ Michigan, Dept Neurol, Ann Arbor, MI 48109 USA
[4] Univ Michigan, Dept Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
[5] Univ Michigan, Michigan Inst Data Sci, Ann Arbor, MI 48109 USA
关键词
ALS; Amyotrophic lateral sclerosis; Decision support; Machine learning; Predictive analytics; Data science; Big data; DISEASE PROGRESSION; RANDOM FOREST; ALS; OPTIMIZATION; IMPUTATION; WINDOWS; SCALE;
D O I
10.1007/s12021-018-9406-9
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Amyotrophic lateral sclerosis (ALS) is a complex progressive neurodegenerative disorder with an estimated prevalence of about 5 per 100,000 people in the United States. In this study, the ALS disease progression is measured by the change of Amyotrophic Lateral Sclerosis Functional Rating Scale (ALSFRS) score over time. The study aims to provide clinical decision support for timely forecasting of the ALS trajectory as well as accurate and reproducible computable phenotypic clustering of participants. Patient data are extracted from DREAM-Phil Bowen ALS Prediction Prize4Life Challenge data, most of which are from the Pooled Resource Open-Access ALS Clinical Trials Database (PRO-ACT) archive. We employed model-based and model-free machine-learning methods to predict the change of the ALSFRS score over time. Using training and testing data we quantified and compared the performance of different techniques. We also used unsupervised machine learning methods to cluster the patients into separate computable phenotypes and interpret the derived subcohorts. Direct prediction of univariate clinical outcomes based on model-based (linear models) or model-free (machine learning based techniques - random forest and Bayesian adaptive regression trees) was only moderately successful. The correlation coefficients between clinically observed changes in ALSFRS scores relative to the model-based/model-free predicted counterparts were 0.427(random forest) and 0.545(BART). The reliability of these results were assessed using internal statistical cross validation and well as external data validation. Unsupervised clustering generated very reliable and consistent partitions of the patient cohort into four computable phenotypic subgroups. These clusters were explicated by identifying specific salient clinical features included in the PRO-ACT archive that discriminate between the derived subcohorts. There are differences between alternative analytical methods in forecasting specific clinical phenotypes. Although predicting univariate clinical outcomes may be challenging, our results suggest that modern data science strategies are useful in clustering patients and generating evidence-based ALS hypotheses about complex interactions of multivariate factors. Predicting univariate clinical outcomes using the PRO-ACT data yields only marginal accuracy (about 70%). However, unsupervised clustering of participants into sub-groups generates stable, reliable and consistent (exceeding 95%) computable phenotypes whose explication requires interpretation of multivariate sets of features.Highlights center dot Used a large ALS data archive of 8,000 patients consisting of 3 million records, including 200 clinical features tracked over 12 months.center dot Employed model-based and model-free methods to predict ALSFRS changes over time, cluster patients into cohorts, and derive computable phenotypes.center dot Research findings include stable, reliable, and consistent (95%) patient stratification into computable phenotypes. However, clinical explication of the results requires interpretation of multivariate information.
引用
收藏
页码:407 / 421
页数:15
相关论文
共 50 条
[21]   Amyotrophic lateral sclerosis disease progression model [J].
Gomeni, Roberto ;
Fava, Maurizio .
AMYOTROPHIC LATERAL SCLEROSIS AND FRONTOTEMPORAL DEGENERATION, 2014, 15 (1-2) :119-129
[22]  
Gong Pinghua, 2013, JMLR Workshop Conf Proc, V28, P37
[23]   Progression in ALS is not linear but is curvilinear [J].
Gordon, Paul H. ;
Cheng, Bin ;
Salachas, Francois ;
Pradat, Pierre-Francois ;
Bruneteau, Gaelle ;
Corcia, Philippe ;
Lacomblez, Lucette ;
Meininger, Vincent .
JOURNAL OF NEUROLOGY, 2010, 257 (10) :1713-1717
[24]   Diagnostic support for selected neuromuscular diseases using answer-pattern recognition and data mining techniques: a proof of concept multicenter prospective trial [J].
Grigull, Lorenz ;
Lechner, Werner ;
Petri, Susanne ;
Kollewe, Katja ;
Dengler, Reinhard ;
Mehmecke, Sandra ;
Schumacher, Ulrike ;
Luecke, Thomas ;
Schneider-Gold, Christiane ;
Koehler, Cornelia ;
Guettsches, Anne-Katrin ;
Kortum, Xiaowei ;
Klawonn, Frank .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2016, 16
[25]   RandomForest4Life: A Random Forest for predicting ALS disease progression [J].
Hothorn, Torsten ;
Jung, Hans H. .
AMYOTROPHIC LATERAL SCLEROSIS AND FRONTOTEMPORAL DEGENERATION, 2014, 15 (5-6) :444-452
[26]  
Huang Z, 2017, PLOS COMPUTATIONAL B, V13
[27]   Data clustering: 50 years beyond K-means [J].
Jain, Anil K. .
PATTERN RECOGNITION LETTERS, 2010, 31 (08) :651-666
[28]   Non-convex optimization for machine learning [J].
Jain P. ;
Kar P. .
Foundations and Trends in Machine Learning, 2017, 10 (3-4) :142-336
[29]   Model-free characterization of brain functional networks for motor sequence learning using fMRI [J].
Kineses, Zsigmond Tamas ;
Johansen-Berg, Heidi ;
Tomassini, Valentina ;
Bosnell, Rose ;
Matthews, Paul M. ;
Beckmann, Christian F. .
NEUROIMAGE, 2008, 39 (04) :1950-1958
[30]   Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression [J].
Kueffner, Robert ;
Zach, Neta ;
Norel, Raquel ;
Hawe, Johann ;
Schoenfeld, David ;
Wang, Liuxia ;
Li, Guang ;
Fang, Lilly ;
Mackey, Lester ;
Hardiman, Orla ;
Cudkowicz, Merit ;
Sherman, Alexander ;
Ertaylan, Gokhan ;
Grosse-Wentrup, Moritz ;
Hothorn, Torsten ;
van Ligtenberg, Jules ;
Macke, Jakob H. ;
Meyer, Timm ;
Schoelkopf, Bernhard ;
Tran, Linh ;
Vaughan, Rubio ;
Stolovitzky, Gustavo ;
Leitner, Melanie L. .
NATURE BIOTECHNOLOGY, 2015, 33 (01) :51-U292