Brain-phenotype predictions of language and executive function can survive across diverse real-world data: Dataset shifts in developmental populations

被引:0
|
作者
Adkinson, Brendan D. [1 ]
Rosenblatt, Matthew [2 ]
Dadashkarimi, Javid [3 ,4 ]
Tejavibulya, Link [1 ]
Jiang, Rongtao [5 ]
Noble, Stephanie [5 ,6 ,7 ]
Scheinost, Dustin [1 ,2 ,5 ,8 ,9 ,10 ]
机构
[1] Yale Sch Med, Interdept Neurosci Program, New Haven, CT 06510 USA
[2] Yale Univ, Dept Biomed Engn, New Haven, CT 06520 USA
[3] Massachusetts Gen Hosp, Athinoula Martinos Ctr Biomed Imaging, Dept Radiol, Charlestown, MA 02129 USA
[4] Harvard Med Sch, Dept Radiol, Boston, MA 02129 USA
[5] Yale Sch Med, Dept Radiol & Biomed Imaging, New Haven, CT 06510 USA
[6] Northeastern Univ, Dept Bioengn, Boston, MA 02120 USA
[7] Northeastern Univ, Dept Psychol, Boston, MA 02115 USA
[8] Yale Univ, Dept Stat & Data Sci, New Haven, CT 06520 USA
[9] Yale Sch Med, Child Study Ctr, New Haven, CT 06510 USA
[10] Yale Univ, Wu Tsai Inst, New Haven, CT 06510 USA
基金
美国国家科学基金会;
关键词
Machine learning; Cognition; Dataset shift; Diversity; Harmonization; Childhood; Adolescence; INDIVIDUAL-DIFFERENCES; FMRI DATA; RELIABILITY;
D O I
10.1016/j.dcn.2024.101464
中图分类号
B844 [发展心理学(人类心理学)];
学科分类号
040202 ;
摘要
Predictive modeling potentially increases the reproducibility and generalizability of neuroimaging brainphenotype associations. Yet, the evaluation of a model in another dataset is underutilized. Among studies that undertake external validation, there is a notable lack of attention to generalization across dataset-specific idiosyncrasies (i.e., dataset shifts). Research settings, by design, remove the between-site variations that real-world and, eventually, clinical applications demand. Here, we rigorously test the ability of a range of predictive models to generalize across three diverse, unharmonized developmental samples: the Philadelphia Neurodevelopmental Cohort (n=1291), the Healthy Brain Network (n=1110), and the Human Connectome Project in Development (n=428). These datasets have high inter-dataset heterogeneity, encompassing substantial variations in age distribution, sex, racial and ethnic minority representation, recruitment geography, clinical symptom burdens, fMRI tasks, sequences, and behavioral measures. Through advanced methodological approaches, we demonstrate that reproducible and generalizable brain-behavior associations can be realized across diverse dataset features. Results indicate the potential of functional connectome-based predictive models to be robust despite substantial inter-dataset variability. Notably, for the HCPD and HBN datasets, the best predictions were not from training and testing in the same dataset (i.e., cross-validation) but across datasets. This result suggests that training on diverse data may improve prediction in specific cases. Overall, this work provides a critical foundation for future work evaluating the generalizability of brain-phenotype associations in real-world scenarios and clinical settings.
引用
收藏
页数:11
相关论文
empty
未找到相关数据