Quantifying performance of machine learning methods for neuroimaging data

被引:111
作者
Jollans, Lee [1 ,2 ]
Boyle, Rory [1 ]
Artiges, Eric [3 ,4 ]
Banaschewski, Tobias [5 ]
Desrivieres, Sylvane [6 ]
Grigis, Antoine [7 ]
Martinot, Jean-Luc [3 ,8 ]
Paus, Tomas [9 ,10 ,11 ]
Smolka, Michael N. [12 ,13 ]
Walter, Henrik [14 ,15 ,16 ,17 ,18 ,19 ]
Schumann, Gunter [6 ]
Garavan, Hugh [20 ]
Whelan, Robert [1 ,21 ]
机构
[1] Trinity Coll Dublin, Sch Psychol, Dublin, Ireland
[2] Max Planck Inst Psychiat, Dept Translat Res Psychiat, Munich, Germany
[3] Univ Paris 05, Univ Paris Sud, Sorbonne Paris Cite, INSERM Unit Neuroimaging & Psychiat 1000,Inst Nat, Paris, France
[4] Orsay Hosp, Psychiat Dept 91G16, Orsay, France
[5] Heidelberg Univ, Med Fac Mannheim, Cent Inst Mental Hlth, Dept Child & Adolescent Psychiat & Psychotherapy, Sq J5, D-68159 Mannheim, Germany
[6] Kings Coll London, Inst Psychiat Psychol & Neurosci, Social Genet & Dev Psychiat Ctr, MRC, London, England
[7] Univ Paris Saclay, CEA, NeuroSpin, F-91191 Gif Sur Yvette, France
[8] Maison de Solenn, Paris, France
[9] Univ Toronto, Holland Bloorview Kids Rehabil Hosp, Bloorview Res Inst, Toronto, ON M6A 2E1, Canada
[10] Univ Toronto, Dept Psychol, Toronto, ON M6A 2E1, Canada
[11] Univ Toronto, Dept Psychiat, Toronto, ON M6A 2E1, Canada
[12] Tech Univ Dresden, Dept Psychiat, Dresden, Germany
[13] Tech Univ Dresden, Neuroimaging Ctr, Dresden, Germany
[14] Charite Univ Med Berlin, Charitepl 1, Berlin, Germany
[15] Free Univ Berlin, Charitepl 1, Berlin, Germany
[16] Humboldt Univ, Charitepl 1, Berlin, Germany
[17] Berlin Inst Hlth, Charitepl 1, Berlin, Germany
[18] Dept Psychiat & Psychotherapy, Charitepl 1, Berlin, Germany
[19] Campus Charite Mitte, Charitepl 1, Berlin, Germany
[20] Univ Vermont, Dept Psychiat, Burlington, VT USA
[21] Trinity Coll Dublin, Global Brain Hlth Inst, Dublin, Ireland
基金
英国医学研究理事会; 瑞典研究理事会; 爱尔兰科学基金会; 欧盟地平线“2020”; 美国国家卫生研究院;
关键词
Machine learning; Neuroimaging; Regression algorithms; Reproducibility; FEATURE-SELECTION TECHNIQUES; CROSS-VALIDATION; SAMPLE-SIZE; PREDICTION; MRI; DEPRESSION; BIOMARKERS; CLASSIFICATION; PSYCHOSIS; ENSEMBLE;
D O I
10.1016/j.neuroimage.2019.05.082
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Machine learning is increasingly being applied to neuroimaging data. However, most machine learning algorithms have not been designed to accommodate neuroimaging data, which typically has many more data points than subjects, in addition to multicollinearity and low signal-to-noise. Consequently, the relative efficacy of different machine learning regression algorithms for different types of neuroimaging data are not known. Here, we sought to quantify the performance of a variety of machine learning algorithms for use with neuroimaging data with various sample sizes, feature set sizes, and predictor effect sizes. The contribution of additional machine learning techniques - embedded feature selection and bootstrap aggregation (bagging) - to model performance was also quantified. Five machine learning regression methods - Gaussian Process Regression, Multiple Kernel Learning, Kernel Ridge Regression, the Elastic Net and Random Forest, were examined with both real and simulated MRI data, and in comparison to standard multiple regression. The different machine learning regression algorithms produced varying results, which depended on sample size, feature set size, and predictor effect size. When the effect size was large, the Elastic Net, Kernel Ridge Regression and Gaussian Process Regression performed well at most sample sizes and feature set sizes. However, when the effect size was small, only the Elastic Net made accurate predictions, but this was limited to analyses with sample sizes greater than 400. Random Forest also produced a moderate performance for small effect sizes, but could do so across all sample sizes. Machine learning techniques also improved prediction accuracy for multiple regression. These data provide empirical evidence for the differential performance of various machines on neuroimaging data, which are dependent on number of sample size, features and effect size.
引用
收藏
页码:351 / 365
页数:15
相关论文
共 94 条
[1]   Toward a gold standard for promoter prediction evaluation [J].
Abeel, Thomas ;
Van de Peer, Yves ;
Saeys, Yvan .
BIOINFORMATICS, 2009, 25 (12) :I313-I320
[2]  
Adar N., 2016, FEATURE SELECTION MR
[3]   Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls [J].
Arbabshirani, Mohammad R. ;
Plis, Sergey ;
Sui, Jing ;
Calhoun, Vince D. .
NEUROIMAGE, 2017, 145 :137-165
[4]   Single-Subject Anxiety Treatment Outcome Prediction using Functional Neuroimaging [J].
Ball, Tali M. ;
Stein, Murray B. ;
Ramsawh, Holly J. ;
Campbell-Sills, Laura ;
Paulus, Martin P. .
NEUROPSYCHOPHARMACOLOGY, 2014, 39 (05) :1254-1261
[5]   Multi-level bootstrap analysis of stable clusters in resting-state fMRI [J].
Bellec, Pierre ;
Rosa-Neto, Pedro ;
Lyttelton, Oliver C. ;
Benali, Habib ;
Evans, Alan C. .
NEUROIMAGE, 2010, 51 (03) :1126-1139
[6]   Toward discovery science of human brain function [J].
Biswal, Bharat B. ;
Mennes, Maarten ;
Zuo, Xi-Nian ;
Gohel, Suril ;
Kelly, Clare ;
Smith, Steve M. ;
Beckmann, Christian F. ;
Adelstein, Jonathan S. ;
Buckner, Randy L. ;
Colcombe, Stan ;
Dogonowski, Anne-Marie ;
Ernst, Monique ;
Fair, Damien ;
Hampson, Michelle ;
Hoptman, Matthew J. ;
Hyde, James S. ;
Kiviniemi, Vesa J. ;
Kotter, Rolf ;
Li, Shi-Jiang ;
Lin, Ching-Po ;
Lowe, Mark J. ;
Mackay, Clare ;
Madden, David J. ;
Madsen, Kristoffer H. ;
Margulies, Daniel S. ;
Mayberg, Helen S. ;
McMahon, Katie ;
Monk, Christopher S. ;
Mostofsky, Stewart H. ;
Nagel, Bonnie J. ;
Pekar, James J. ;
Peltier, Scott J. ;
Petersen, Steven E. ;
Riedl, Valentin ;
Rombouts, Serge A. R. B. ;
Rypma, Bart ;
Schlaggar, Bradley L. ;
Schmidt, Sein ;
Seidler, Rachael D. ;
Siegle, Greg J. ;
Sorg, Christian ;
Teng, Gao-Jun ;
Veijola, Juha ;
Villringer, Arno ;
Walter, Martin ;
Wang, Lihong ;
Weng, Xu-Chu ;
Whitfield-Gabrieli, Susan ;
Williamson, Peter ;
Windischberger, Christian .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (10) :4734-4739
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Neuroanatomical Assessment of Biological Maturity [J].
Brown, Timothy T. ;
Kuperman, Joshua M. ;
Chung, Yoonho ;
Erhart, Matthew ;
McCabe, Connor ;
Hagler, Donald J., Jr. ;
Venkatraman, Vijay K. ;
Akshoomoff, Natacha ;
Amaral, David G. ;
Bloss, Cinnamon S. ;
Casey, B. J. ;
Chang, Linda ;
Ernst, Thomas M. ;
Frazier, Jean A. ;
Gruen, Jeffrey R. ;
Kaufmann, Walter E. ;
Kenet, Tal ;
Kennedy, David N. ;
Murray, Sarah S. ;
Sowell, Elizabeth R. ;
Jernigan, Terry L. ;
Dale, Anders M. .
CURRENT BIOLOGY, 2012, 22 (18) :1693-1698
[9]   Power failure: why small sample size undermines the reliability of neuroscience [J].
Button, Katherine S. ;
Ioannidis, John P. A. ;
Mokrysz, Claire ;
Nosek, Brian A. ;
Flint, Jonathan ;
Robinson, Emma S. J. ;
Munafo, Marcus R. .
NATURE REVIEWS NEUROSCIENCE, 2013, 14 (05) :365-376
[10]   POINTS OF SIGNIFICANCE Statistics versus machine learning [J].
Bzdok, Danilo ;
Altman, Naomi ;
Krzywinski, Martin .
NATURE METHODS, 2018, 15 (04) :232-233