Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering

被引:21
作者
Tang, Ming [1 ,2 ]
Gao, Chao [1 ,2 ]
Goutman, Stephen A. [3 ]
Kalinin, Alexandr [1 ,4 ]
Mukherjee, Bhramar [2 ]
Guan, Yuanfang [4 ]
Dinov, Ivo D. [1 ,4 ,5 ]
机构
[1] Univ Michigan, Stat Online Computat Resource, Dept Hlth Behav & Biol Sci, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Dept Biostat, Ann Arbor, MI 48109 USA
[3] Univ Michigan, Dept Neurol, Ann Arbor, MI 48109 USA
[4] Univ Michigan, Dept Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
[5] Univ Michigan, Michigan Inst Data Sci, Ann Arbor, MI 48109 USA
关键词
ALS; Amyotrophic lateral sclerosis; Decision support; Machine learning; Predictive analytics; Data science; Big data; DISEASE PROGRESSION; RANDOM FOREST; ALS; OPTIMIZATION; IMPUTATION; WINDOWS; SCALE;
D O I
10.1007/s12021-018-9406-9
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Amyotrophic lateral sclerosis (ALS) is a complex progressive neurodegenerative disorder with an estimated prevalence of about 5 per 100,000 people in the United States. In this study, the ALS disease progression is measured by the change of Amyotrophic Lateral Sclerosis Functional Rating Scale (ALSFRS) score over time. The study aims to provide clinical decision support for timely forecasting of the ALS trajectory as well as accurate and reproducible computable phenotypic clustering of participants. Patient data are extracted from DREAM-Phil Bowen ALS Prediction Prize4Life Challenge data, most of which are from the Pooled Resource Open-Access ALS Clinical Trials Database (PRO-ACT) archive. We employed model-based and model-free machine-learning methods to predict the change of the ALSFRS score over time. Using training and testing data we quantified and compared the performance of different techniques. We also used unsupervised machine learning methods to cluster the patients into separate computable phenotypes and interpret the derived subcohorts. Direct prediction of univariate clinical outcomes based on model-based (linear models) or model-free (machine learning based techniques - random forest and Bayesian adaptive regression trees) was only moderately successful. The correlation coefficients between clinically observed changes in ALSFRS scores relative to the model-based/model-free predicted counterparts were 0.427(random forest) and 0.545(BART). The reliability of these results were assessed using internal statistical cross validation and well as external data validation. Unsupervised clustering generated very reliable and consistent partitions of the patient cohort into four computable phenotypic subgroups. These clusters were explicated by identifying specific salient clinical features included in the PRO-ACT archive that discriminate between the derived subcohorts. There are differences between alternative analytical methods in forecasting specific clinical phenotypes. Although predicting univariate clinical outcomes may be challenging, our results suggest that modern data science strategies are useful in clustering patients and generating evidence-based ALS hypotheses about complex interactions of multivariate factors. Predicting univariate clinical outcomes using the PRO-ACT data yields only marginal accuracy (about 70%). However, unsupervised clustering of participants into sub-groups generates stable, reliable and consistent (exceeding 95%) computable phenotypes whose explication requires interpretation of multivariate sets of features.Highlights center dot Used a large ALS data archive of 8,000 patients consisting of 3 million records, including 200 clinical features tracked over 12 months.center dot Employed model-based and model-free methods to predict ALSFRS changes over time, cluster patients into cohorts, and derive computable phenotypes.center dot Research findings include stable, reliable, and consistent (95%) patient stratification into computable phenotypes. However, clinical explication of the results requires interpretation of multivariate information.
引用
收藏
页码:407 / 421
页数:15
相关论文
共 50 条
[1]   Diagnostics for multivariate imputations [J].
Abayomi, Kobi ;
Gelman, Andrew ;
Levy, Marc .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2008, 57 :273-291
[2]  
Allen-Zhu Z, 2016, PR MACH LEARN RES, V48
[3]   Convex Optimization: Algorithms and Complexity [J].
不详 .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2015, 8 (3-4) :232-+
[4]   The PRO-ACT database Design, initial analyses, and predictive features [J].
Atassi, Nazem ;
Berry, James ;
Shui, Amy ;
Zach, Neta ;
Sherman, Alexander ;
Sinani, Ervin ;
Walker, Jason ;
Katsovskiy, Igor ;
Schoenfeld, David ;
Cudkowicz, Merit ;
Leitner, Melanie .
NEUROLOGY, 2014, 83 (19) :1719-1725
[5]  
Beaulieu-Jones BK, 2017, BIOCOMPUT-PAC SYM, P207, DOI 10.1142/9789813207813_0021
[6]  
Bergsma W, 2009, STAT SOC BEHAV SC, P1, DOI 10.1007/978-0-387-09610-0_1
[7]   Prognostic models based on patient snapshots and time windows: Predicting disease progression to assisted ventilation in Amyotrophic Lateral Sclerosis [J].
Carreiro, Andre V. ;
Amaral, Pedro M. T. ;
Pinto, Susana ;
Tomas, Pedro ;
de Carvalho, Mamede ;
Madeira, Sara C. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 58 :133-144
[8]   Performance of the amyotrophic lateral sclerosis functional rating scale (ALSFRS) in multicenter clinical trials [J].
Cedarbaum, JM ;
Stambler, N .
JOURNAL OF THE NEUROLOGICAL SCIENCES, 1997, 152 :S1-S9
[9]   The ALSFRS-R: a revised ALS functional rating scale that incorporates assessments of respiratory function [J].
Cedarbaum, JM ;
Stambler, N ;
Malta, E ;
Fuller, C ;
Hilt, D ;
Thurmond, B ;
Nakanishi, A .
JOURNAL OF THE NEUROLOGICAL SCIENCES, 1999, 169 (1-2) :13-21
[10]  
Chatterjee S., 2000, Regression Analysis by Example, V3rd