BioM2: biologically informed multi-stage machine learning for phenotype prediction using omics data

被引:0
|
作者
Zhang, Shunjie [1 ]
Li, Pan [2 ]
Wang, Shenghan [2 ]
Zhu, Jijun [2 ]
Huang, Zhongting [2 ]
Cai, Fuqiang
Freidel, Sebastian [4 ]
Ling, Fei [1 ,2 ]
Schwarz, Emanuel [3 ,4 ]
Chen, Junfang [2 ,5 ]
机构
[1] South China Univ Technol, Sch Biol & Biol Engn, Guangzhou, Peoples R China
[2] Fudan Univ, Greater Bay Area Inst Precis Med Guangzhou, Ctr Intelligent Med, Sch Life Sci, 6,2nd Nanjiang Rd, Guangzhou 511462, Peoples R China
[3] Heidelberg Univ, Hector Inst Artificial Intelligence Psychiat, Med Fac Mannheim, Cent Inst Mental Hlth, M7, D-68161 Mannheim, Germany
[4] Heidelberg Univ, Cent Inst Mental Hlth, Med Fac, Dept Psychiat & Psychotherapy, J5, D-68159 Mannheim, Germany
[5] Fudan Univ, Ctr Evolutionary Biol, Sch Life Sci, Shanghai, Peoples R China
关键词
BioM2; machine learning; phenotype prediction; DNA methylome; transcriptome; Gene Ontology; EXPRESSION; BRAIN; PATHWAY;
D O I
10.1093/bib/bbae384
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Navigating the complex landscape of high-dimensional omics data with machine learning models presents a significant challenge. The integration of biological domain knowledge into these models has shown promise in creating more meaningful stratifications of predictor variables, leading to algorithms that are both more accurate and generalizable. However, the wider availability of machine learning tools capable of incorporating such biological knowledge remains limited. Addressing this gap, we introduce BioM2, a novel R package designed for biologically informed multistage machine learning. BioM2 uniquely leverages biological information to effectively stratify and aggregate high-dimensional biological data in the context of machine learning. Demonstrating its utility with genome-wide DNA methylation and transcriptome-wide gene expression data, BioM2 has shown to enhance predictive performance, surpassing traditional machine learning models that operate without the integration of biological knowledge. A key feature of BioM2 is its ability to rank predictor variables within biological categories, specifically Gene Ontology pathways. This functionality not only aids in the interpretability of the results but also enables a subsequent modular network analysis of these variables, shedding light on the intricate systems-level biology underpinning the predictive outcome. We have proposed a biologically informed multistage machine learning framework termed BioM2 for phenotype prediction based on omics data. BioM2 has been incorporated into the BioM2 CRAN package (https://cran.r-project.org/web/packages/BioM2/index.html).
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Revealing the multi-stage ore-forming history of a mineral deposit using pyrite geochemistry and machine learning-based data interpretation
    Zhong, Richen
    Deng, Yi
    Li, Wenbo
    Danyushevsky, Leonid, V
    Cracknell, Matthew J.
    Belousov, Ivan
    Chen, Yanjing
    Li, Lamei
    ORE GEOLOGY REVIEWS, 2021, 133
  • [22] A Lightweight and Multi-Stage Approach for Android Malware Detection Using Non-Invasive Machine Learning Techniques
    da Costa, Leonardo
    Moia, Vitor
    IEEE ACCESS, 2023, 11 : 73127 - 73144
  • [23] Inferring tumor purity using multi-omics data based on a uniform machine learning framework MoTP
    Lu, Qiqi
    Liu, Zhixian
    Wang, Xiaosheng
    BRIEFINGS IN BIOINFORMATICS, 2025, 26 (01)
  • [24] Machine learning algorithms and biomarkers identification for pancreatic cancer diagnosis using multi-omics data integration
    Rouzbahani, Arian Karimi
    Khalili-Tanha, Ghazaleh
    Rajabloo, Yasamin
    Khojasteh-Leylakoohi, Fatemeh
    Garjan, Hassan Shokri
    Nazari, Elham
    Avan, Amir
    PATHOLOGY RESEARCH AND PRACTICE, 2024, 263
  • [25] Potato Yield Prediction Using Machine Learning Techniques and Sentinel 2 Data
    Gomez, Diego
    Salvador, Pablo
    Sanz, Julia
    Luis Casanova, Jose
    REMOTE SENSING, 2019, 11 (15)
  • [26] Machine learning for classification of hypertension subtypes using multi-omics: A multi-centre, retrospective, data-driven study
    Reel, Parminder S.
    Reel, Smarti
    van Kralingen, Josie C.
    Langton, Katharina
    Lang, Katharina
    Erlic, Zoran
    Larsen, Casper K.
    Amar, Laurence
    Pamporaki, Christina
    Mulatero, Paolo
    Blanchard, Anne
    Kabat, Marek
    Robertson, Stacy
    MacKenzie, Scott M.
    Taylor, Angela E.
    Peitzsch, Mirko
    Ceccato, Filippo
    Scaroni, Carla
    Reincke, Martin
    Kroiss, Matthias
    Dennedy, Michael C.
    Pecori, Alessio
    Monticone, Silvia
    Deinum, Jaap
    Rossi, Gian Paolo
    Lenzini, Livia
    McClure, John D.
    Nind, Thomas
    Riddell, Alexandra
    Stell, Anthony
    Cole, Christian
    Sudano, Isabella
    Prehn, Cornelia
    Adamski, Jerzy
    Gimenez-Roqueplo, Anne-Paule
    Assie, Guillaume
    Arlt, Wiebke
    Beuschlein, Felix
    Eisenhofer, Graeme
    Davies, Eleanor
    Zennaro, Maria-Christina
    Jefferson, Emily
    EBIOMEDICINE, 2022, 84
  • [27] Early prediction of clinical response to anti-TNF treatment using multi-omics and machine learning in rheumatoid arthritis
    Yoosuf, Niyaz
    Maciejewski, Mateusz
    Ziemek, Daniel
    Jelinsky, Scott A.
    Folkersen, Lasse
    Muller, Malin
    Sahlstrom, Peter
    Vivar, Nancy
    Catrina, Anca
    Berg, Louise
    Klareskog, Lars
    Padyukov, Leonid
    Brynedal, Boel
    RHEUMATOLOGY, 2022, 61 (04) : 1680 - 1689
  • [28] Toward Multi-Stage Phenotyping of Soybean with Multimodal UAV Sensor Data: A Comparison of Machine Learning Approaches for Leaf Area Index Estimation
    Zhang, Yi
    Yang, Yizhe
    Zhang, Qinwei
    Duan, Runqing
    Liu, Junqi
    Qin, Yuchu
    Wang, Xianzhi
    REMOTE SENSING, 2023, 15 (01)
  • [29] Machine Learning for VRUs accidents prediction using V2X data
    Ribeiro, Bruno
    Nicolau, Maria Joao
    Santos, Alexandre
    38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 1789 - 1798
  • [30] Prediction of neonatal subgaleal hemorrhage using first stage of labor data: A machine-learning based model
    Guedalia, Joshua
    Lipschuetz, Michal
    Daoud-Sabag, Lina
    Cohen, Sarah M.
    NovoselskyPersky, Michal
    Yagel, Simcha
    Unger, Ron
    Karavani, Gilad
    JOURNAL OF GYNECOLOGY OBSTETRICS AND HUMAN REPRODUCTION, 2022, 51 (03)