Transcriptome prediction performance across machine learning models and diverse ancestries

被引:16
|
作者
Okoro, Paul C. [1 ]
Schubert, Ryan [2 ]
Guo, Xiuqing [3 ,4 ]
Johnson, W. Craig [5 ]
Rotter, Jerome, I [3 ,4 ]
Hoeschele, Ina [6 ,7 ,8 ]
Liu, Yongmei [9 ]
Im, Hae Kyung [10 ]
Luke, Amy [11 ]
Dugas, Lara R. [11 ,12 ]
Wheeler, Heather E. [1 ,13 ,14 ]
机构
[1] Loyola Univ Chicago, Program Bioinformat, Chicago, IL 60660 USA
[2] Loyola Univ Chicago, Dept Math & Stat, Chicago, IL USA
[3] Harbor UCLA Med Ctr, Inst Translat Genom & Populat Sci, Lundquist Inst, Torrance, CA 90509 USA
[4] Harbor UCLA Med Ctr, Dept Pediat, Torrance, CA 90509 USA
[5] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[6] Virginia Tech, Fralin Life Sci Inst, Blacksburg, VA USA
[7] Virginia Tech, Dept Stat, Blacksburg, VA USA
[8] Wake Forest Sch Med, Winston Salem, NC 27101 USA
[9] Duke Univ, Sch Med, Dept Med, Durham, NC 27706 USA
[10] Univ Chicago, Dept Med, Sect Genet Med, 5841 S Maryland Ave, Chicago, IL 60637 USA
[11] Loyola Univ Chicago, Parkinson Sch Hlth Sci & Publ Hlth, Dept Publ Hlth Sci, Maywood, IL USA
[12] Univ Cape Town, Fac Hlth Sci, Dept Human Biol, Cape Town, South Africa
[13] Loyola Univ Chicago, Dept Biol, Chicago, IL 60660 USA
[14] Loyola Univ Chicago, Dept Comp Sci, Chicago, IL 60660 USA
来源
HUMAN GENETICS AND GENOMICS ADVANCES | 2021年 / 2卷 / 02期
关键词
GENOME-WIDE ASSOCIATION; GENE-EXPRESSION; VARIABLE SELECTION; COMPLEX TRAITS; REGRESSION; CETP; STRATIFICATION; REGULARIZATION; INFERENCE; HDL;
D O I
10.1016/j.xhgg.2020.100019
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Reliable machine learning models in genomic medicine using conformal prediction
    Papangelou, Christina
    Kyriakidis, Konstantinos
    Natsiavas, Pantelis
    Chouvarda, Ioanna
    Malousi, Andigoni
    FRONTIERS IN BIOINFORMATICS, 2025, 5
  • [42] A review of machine learning models applied to genomic prediction in animal breeding
    Chafai, Narjice
    Hayah, Ichrak
    Houaga, Isidore
    Badaoui, Bouabid
    FRONTIERS IN GENETICS, 2023, 14
  • [43] Machine learning models for orthokeratology lens fitting and axial length prediction
    Xu, Shuai
    Yang, Xiaoyan
    Zhang, Shuxian
    Zheng, Xuan
    Zheng, Fang
    Liu, Yin
    Zhang, Hanyu
    Ye, Qing
    Li, Lihua
    OPHTHALMIC AND PHYSIOLOGICAL OPTICS, 2023, 43 (06) : 1462 - 1468
  • [44] Grape Yield Prediction Models: Approaching Different Machine Learning Algorithms
    Andrade, Caio Bustani
    Moura-Bueno, Jean Michel
    Comin, Jucinei Jose
    Brunetto, Gustavo
    HORTICULTURAE, 2023, 9 (12)
  • [45] Benchmarking Parametric and Machine Learning Models for Genomic Prediction of Complex Traits
    Azodi, Christina B.
    Bolger, Emily
    McCarren, Andrew
    Roantree, Mark
    de los Campos, Gustavo
    Shiu, Shin-Han
    G3-GENES GENOMES GENETICS, 2019, 9 (11): : 3691 - 3702
  • [46] A Comparative Analysis of Machine Learning Models in Prediction of Mortar Compressive Strength
    Gayathri, Rajakumaran
    Rani, Shola Usha
    Cepova, Lenka
    Rajesh, Murugesan
    Kalita, Kanak
    PROCESSES, 2022, 10 (07)
  • [47] Machine Learning Models for Prediction of Sex Based on Lumbar Vertebral Morphometry
    Diac, Madalina Maria
    Toma, Gina Madalina
    Damian, Simona Irina
    Fotache, Marin
    Romanov, Nicolae
    Tabian, Daniel
    Sechel, Gabriela
    Scripcaru, Andrei
    Hancianu, Monica
    Iliescu, Diana Bulgaru
    DIAGNOSTICS, 2023, 13 (24)
  • [48] River Water Salinity Prediction Using Hybrid Machine Learning Models
    Melesse, Assefa M.
    Khosravi, Khabat
    Tiefenbacher, John P.
    Heddam, Salim
    Kim, Sungwon
    Mosavi, Amir
    Pham, Binh Thai
    WATER, 2020, 12 (10) : 1 - 21
  • [49] Gully erosion susceptibility prediction in Mollisols using machine learning models
    Wang, Y.
    Zhang, Y.
    Chen, H.
    JOURNAL OF SOIL AND WATER CONSERVATION, 2023, 78 (05) : 385 - 396
  • [50] Battery lifetime prediction across diverse ageing conditions with inter-cell deep learning
    Zhang, Han
    Li, Yuqi
    Zheng, Shun
    Lu, Ziheng
    Gui, Xiaofan
    Xu, Wei
    Bian, Jiang
    NATURE MACHINE INTELLIGENCE, 2025, 7 (02) : 270 - 277