BioM2: biologically informed multi-stage machine learning for phenotype prediction using omics data

被引:0
|
作者
Zhang, Shunjie [1 ]
Li, Pan [2 ]
Wang, Shenghan [2 ]
Zhu, Jijun [2 ]
Huang, Zhongting [2 ]
Cai, Fuqiang
Freidel, Sebastian [4 ]
Ling, Fei [1 ,2 ]
Schwarz, Emanuel [3 ,4 ]
Chen, Junfang [2 ,5 ]
机构
[1] South China Univ Technol, Sch Biol & Biol Engn, Guangzhou, Peoples R China
[2] Fudan Univ, Greater Bay Area Inst Precis Med Guangzhou, Ctr Intelligent Med, Sch Life Sci, 6,2nd Nanjiang Rd, Guangzhou 511462, Peoples R China
[3] Heidelberg Univ, Hector Inst Artificial Intelligence Psychiat, Med Fac Mannheim, Cent Inst Mental Hlth, M7, D-68161 Mannheim, Germany
[4] Heidelberg Univ, Cent Inst Mental Hlth, Med Fac, Dept Psychiat & Psychotherapy, J5, D-68159 Mannheim, Germany
[5] Fudan Univ, Ctr Evolutionary Biol, Sch Life Sci, Shanghai, Peoples R China
关键词
BioM2; machine learning; phenotype prediction; DNA methylome; transcriptome; Gene Ontology; EXPRESSION; BRAIN; PATHWAY;
D O I
10.1093/bib/bbae384
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Navigating the complex landscape of high-dimensional omics data with machine learning models presents a significant challenge. The integration of biological domain knowledge into these models has shown promise in creating more meaningful stratifications of predictor variables, leading to algorithms that are both more accurate and generalizable. However, the wider availability of machine learning tools capable of incorporating such biological knowledge remains limited. Addressing this gap, we introduce BioM2, a novel R package designed for biologically informed multistage machine learning. BioM2 uniquely leverages biological information to effectively stratify and aggregate high-dimensional biological data in the context of machine learning. Demonstrating its utility with genome-wide DNA methylation and transcriptome-wide gene expression data, BioM2 has shown to enhance predictive performance, surpassing traditional machine learning models that operate without the integration of biological knowledge. A key feature of BioM2 is its ability to rank predictor variables within biological categories, specifically Gene Ontology pathways. This functionality not only aids in the interpretability of the results but also enables a subsequent modular network analysis of these variables, shedding light on the intricate systems-level biology underpinning the predictive outcome. We have proposed a biologically informed multistage machine learning framework termed BioM2 for phenotype prediction based on omics data. BioM2 has been incorporated into the BioM2 CRAN package (https://cran.r-project.org/web/packages/BioM2/index.html).
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Prediction of Field-Scale Wheat Yield Using Machine Learning Method and Multi-Spectral UAV Data
    Bian, Chaofa
    Shi, Hongtao
    Wu, Suqin
    Zhang, Kefei
    Wei, Meng
    Zhao, Yindi
    Sun, Yaqin
    Zhuang, Huifu
    Zhang, Xuewei
    Chen, Shuo
    REMOTE SENSING, 2022, 14 (06)
  • [42] Machine Learning Prediction of Liver Stiffness Using Clinical and T2-Weighted MRI Radiomic Data
    He, Lili
    Li, Hailong
    Dudley, Jonathan A.
    Maloney, Thomas C.
    Brady, Samuel L.
    Somasundaram, Elanchezhian
    Trout, Andrew T.
    Dillman, Jonathan R.
    AMERICAN JOURNAL OF ROENTGENOLOGY, 2019, 213 (03) : 592 - 601
  • [43] Machine learning-based survival rate prediction of Korean hepatocellular carcinoma patients using multi-center data
    Byeonggwan Noh
    Young Mok Park
    Yujin Kwon
    Chang In Choi
    Byung Kwan Choi
    Kwang il Seo
    Yo-Han Park
    Kwangho Yang
    Sunju Lee
    Taeyoung Ha
    YunKyong Hyon
    Myunghee Yoon
    BMC Gastroenterology, 22
  • [44] Enhanced SARS-CoV-2 case prediction using public health data and machine learning models
    Price, Bradley S.
    Khodaverdi, Maryam
    Hendricks, Brian
    Smith, Gordon S.
    Kimble, Wes
    Halasz, Adam
    Guthrie, Sara
    Fraustino, Julia D.
    Hodder, Sally L.
    JAMIA OPEN, 2024, 7 (01)
  • [45] Development of machine learning models using multi-source data for geographical traceability and content prediction of Eucommia ulmoides leaves
    Zhang, Yanying
    Zhu, Xinyan
    Wang, Yuanzhong
    SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2024, 313
  • [46] Machine learning-based survival rate prediction of Korean hepatocellular carcinoma patients using multi-center data
    Noh, Byeonggwan
    Park, Young Mok
    Kwon, Yujin
    Choi, Chang In
    Choi, Byung Kwan
    Seo, Kwang Il
    Park, Yo-Han
    Yang, Kwangho
    Lee, Sunju
    Ha, Taeyoung
    Hyon, YunKyong
    Yoon, Myunghee
    BMC GASTROENTEROLOGY, 2022, 22 (01)
  • [47] Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods
    Muneeb, Muhammad
    Henschel, Andreas
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [48] A cluster-based local modeling paradigm for high spatiotemporal resolution VPD prediction using multi-source data and machine learning
    Wang, Mi
    Hu, Zhuowei
    Liu, Xiangping
    Hou, Wenxing
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2025, 18 (01)
  • [49] Multi-method machine learning techniques in gold pathfinder elements prediction in central parts of Tanzania using stream sediment geochemical data
    Nunoo, Samuel
    Abu, Mahamuda
    Ayitey, Emmanuel
    Mvile, Benatus Norbert
    Kalimenze, John Desderius
    PHYSICS AND CHEMISTRY OF THE EARTH, 2024, 136
  • [50] Soil salinity prediction using Machine Learning and Sentinel-2 Remote Sensing Data in Hyper-Arid areas
    Kaplan, Gordana
    Gasparovic, Mateo
    Alqasemi, Abduldaem S.
    Aldhaheri, Alya
    Abuelgasim, Abdelgadir
    Ibrahim, Majed
    PHYSICS AND CHEMISTRY OF THE EARTH, 2023, 130