BioM2: biologically informed multi-stage machine learning for phenotype prediction using omics data

被引:0
|
作者
Zhang, Shunjie [1 ]
Li, Pan [2 ]
Wang, Shenghan [2 ]
Zhu, Jijun [2 ]
Huang, Zhongting [2 ]
Cai, Fuqiang
Freidel, Sebastian [4 ]
Ling, Fei [1 ,2 ]
Schwarz, Emanuel [3 ,4 ]
Chen, Junfang [2 ,5 ]
机构
[1] South China Univ Technol, Sch Biol & Biol Engn, Guangzhou, Peoples R China
[2] Fudan Univ, Greater Bay Area Inst Precis Med Guangzhou, Ctr Intelligent Med, Sch Life Sci, 6,2nd Nanjiang Rd, Guangzhou 511462, Peoples R China
[3] Heidelberg Univ, Hector Inst Artificial Intelligence Psychiat, Med Fac Mannheim, Cent Inst Mental Hlth, M7, D-68161 Mannheim, Germany
[4] Heidelberg Univ, Cent Inst Mental Hlth, Med Fac, Dept Psychiat & Psychotherapy, J5, D-68159 Mannheim, Germany
[5] Fudan Univ, Ctr Evolutionary Biol, Sch Life Sci, Shanghai, Peoples R China
关键词
BioM2; machine learning; phenotype prediction; DNA methylome; transcriptome; Gene Ontology; EXPRESSION; BRAIN; PATHWAY;
D O I
10.1093/bib/bbae384
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Navigating the complex landscape of high-dimensional omics data with machine learning models presents a significant challenge. The integration of biological domain knowledge into these models has shown promise in creating more meaningful stratifications of predictor variables, leading to algorithms that are both more accurate and generalizable. However, the wider availability of machine learning tools capable of incorporating such biological knowledge remains limited. Addressing this gap, we introduce BioM2, a novel R package designed for biologically informed multistage machine learning. BioM2 uniquely leverages biological information to effectively stratify and aggregate high-dimensional biological data in the context of machine learning. Demonstrating its utility with genome-wide DNA methylation and transcriptome-wide gene expression data, BioM2 has shown to enhance predictive performance, surpassing traditional machine learning models that operate without the integration of biological knowledge. A key feature of BioM2 is its ability to rank predictor variables within biological categories, specifically Gene Ontology pathways. This functionality not only aids in the interpretability of the results but also enables a subsequent modular network analysis of these variables, shedding light on the intricate systems-level biology underpinning the predictive outcome. We have proposed a biologically informed multistage machine learning framework termed BioM2 for phenotype prediction based on omics data. BioM2 has been incorporated into the BioM2 CRAN package (https://cran.r-project.org/web/packages/BioM2/index.html).
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Prediction of Composite Clinical Outcomes for Childhood Neuroblastoma Using Multi-Omics Data and Machine Learning
    Wang, Panru
    Zhang, Junying
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2025, 26 (01)
  • [2] Multi-Stage Corn Yield Prediction Using High-Resolution UAV Multispectral Data and Machine Learning Models
    Kumar, Chandan
    Mubvumba, Partson
    Huang, Yanbo
    Dhillon, Jagman
    Reddy, Krishna
    AGRONOMY-BASEL, 2023, 13 (05):
  • [3] A Multi-stage Protein Secondary Structure Prediction System Using Machine Learning and Information Theory
    Zamani, Masood
    Kremer, Stefan C.
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 1304 - 1309
  • [4] DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data
    Poirion, Olivier B.
    Jing, Zheng
    Chaudhary, Kumardeep
    Huang, Sijia
    Garmire, Lana X.
    GENOME MEDICINE, 2021, 13 (01)
  • [5] RETRACTED: Lung Cancer Stage Prediction Using Multi-Omics Data (Retracted Article)
    Li, Wei
    Liu, Binchun
    Wang, Weiqian
    Sun, Can
    Che, Jianpeng
    Yuan, Xuelian
    Zhai, Chunbo
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
  • [6] A multi-stage machine learning approach for stock price prediction: Engineered and derivative indices
    Abolmakarem, Shaghayegh
    Abdi, Farshid
    Khalili-Damghani, Kaveh
    Didehkhani, Hosein
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 24
  • [7] Multi-Stage Corn-to-Syrup Process Monitoring and Yield Prediction Using Machine Learning and Statistical Methods
    Hsieh, Sheng-Jen
    Hykin, Jeff
    SENSORS, 2024, 24 (19)
  • [8] DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data
    Olivier B. Poirion
    Zheng Jing
    Kumardeep Chaudhary
    Sijia Huang
    Lana X. Garmire
    Genome Medicine, 13
  • [9] Using machine learning approaches for multi-omics data analysis: A review
    Reel, Parminder S.
    Reel, Smarti
    Pearson, Ewan
    Trucco, Emanuele
    Jefferson, Emily
    BIOTECHNOLOGY ADVANCES, 2021, 49
  • [10] Efficient Bioinspired Feature Selection and Machine Learning Based Framework Using Omics Data and Biological Knowledge Data Bases in Cancer Clinical Endpoint Prediction
    Zenbout, Imene
    Bouramoul, Abdelkrim
    Meshoul, Souham
    Amrane, Mounira
    IEEE ACCESS, 2023, 11 : 2674 - 2699