A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery

被引:23
作者
Wang, Hao [1 ,2 ]
Zhang, Zhaoyue [3 ]
Li, Haicheng [1 ,2 ]
Li, Jinzhao [1 ]
Li, Hanshuang [1 ]
Liu, Mingzhu [1 ,2 ]
Liang, Pengfei [1 ]
Xi, Qilemuge [1 ]
Xing, Yongqiang [4 ]
Yang, Lei [5 ]
Zuo, Yongchun [1 ,2 ]
机构
[1] Inner Mongolia Univ, Coll Life Sci, State Key Lab Reprod Regulat & Breeding Grassland, Hohhot 010070, Peoples R China
[2] Inner Mongolia Wesure Date Technol Co Ltd, Inner Mongolia Intelligent Union Big Data Acad, Digital Coll, Hohhot 010010, Peoples R China
[3] Univ Elect Sci & Technol China, Ctr Informat Biol, Sch Life Sci & Technol, Chengdu 610054, Peoples R China
[4] Inner Mongolia Univ Sci & Technol, Sch Life Sci & Technol, Baotou 014010, Peoples R China
[5] Harbin Med Univ, Coll Bioinformat Sci & Technol, Harbin 150081, Peoples R China
关键词
Preeclampsia risk; Machine learning; Feature selection; Marker genes; Web server; SINGLE-CELL; CANCER CLASSIFICATION; DIFFERENTIATION; EXPRESSION; IDENTIFICATION; PREDICTION;
D O I
10.1186/s13578-023-00991-y
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Background The placenta, as a unique exchange organ between mother and fetus, is essential for successful human pregnancy and fetal health. Preeclampsia (PE) caused by placental dysfunction contributes to both maternal and infant morbidity and mortality. Accurate identification of PE patients plays a vital role in the formulation of treatment plans. However, the traditional clinical methods of PE have a high misdiagnosis rate.Results Here, we first designed a computational biology method that used single-cell transcriptome (scRNA-seq) of healthy pregnancy (38 wk) and early-onset PE (28-32 wk) to identify pathological cell subpopulations and predict PE risk. Based on machine learning methods and feature selection techniques, we observed that the Tuning ReliefF (TURF) score hybrid with XGBoost (TURF_XGB) achieved optimal performance, with 92.61% accuracy and 92.46% recall for classifying nine cell subpopulations of healthy placentas. Biological landscapes of placenta heterogeneity could be mapped by the 110 marker genes screened by TURF_XGB, which revealed the superiority of the TURF feature mining. Moreover, we processed the PE dataset with LASSO to obtain 497 biomarkers. Integration analysis of the above two gene sets revealed that dendritic cells were closely associated with early-onset PE, and C1QB and C1QC might drive preeclampsia by mediating inflammation. In addition, an ensemble model-based risk stratification card was developed to classify preeclampsia patients, and its area under the receiver operating characteristic curve (AUC) could reach 0.99. For broader accessibility, we designed an accessible online web server ().Conclusion Single-cell transcriptome-based preeclampsia risk assessment using an ensemble machine learning framework is a valuable asset for clinical decision-making. C1QB and C1QC may be involved in the development and progression of early-onset PE by affecting the complement and coagulation cascades pathway that mediate inflammation, which has important implications for better understanding the pathogenesis of PE.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Recent Progress in Machine Learning-based Prediction of Peptide Activity for Drug Discovery
    Wu, Qihui
    Ke, Hanzhong
    Li, Dongli
    Wang, Qi
    Fang, Jiansong
    Zhou, Jingwei
    CURRENT TOPICS IN MEDICINAL CHEMISTRY, 2019, 19 (01) : 4 - 16
  • [32] Statistical interpretation of machine learning-based feature importance scores for biomarker discovery
    Van Anh Huynh-Thu
    Saeys, Yvan
    Wehenkel, Louis
    Geurts, Pierre
    BIOINFORMATICS, 2012, 28 (13) : 1766 - 1774
  • [33] Machine Learning-Based Method for Predicting Compressive Strength of Concrete
    Li, Daihong
    Tang, Zhili
    Kang, Qian
    Zhang, Xiaoyu
    Li, Youhua
    PROCESSES, 2023, 11 (02)
  • [34] Machine learning-based detection of chemical risk
    Grabar, Natalia
    Wandji Tchamp, Ornella
    Maxim, Laura
    E-HEALTH - FOR CONTINUITY OF CARE, 2014, 205 : 725 - 729
  • [35] Identification of Risk Factors and Machine Learning-Based Prediction Models for Knee Osteoarthritis Patients
    Kokkotis, Christos
    Moustakidis, Serafeim
    Giakas, Giannis
    Tsaopoulos, Dimitrios
    APPLIED SCIENCES-BASEL, 2020, 10 (19):
  • [36] A machine learning-based protocol to support visual tree assessment and risk of failure classification on a university campus
    Srivanit, Manat
    Kaewkhow, Suppawad
    URBAN FORESTRY & URBAN GREENING, 2024, 99
  • [37] Prediction of Intrauterine Growth Restriction and Preeclampsia Using Machine Learning-Based Algorithms: A Prospective Study
    Vasilache, Ingrid-Andrada
    Scripcariu, Ioana-Sadyie
    Doroftei, Bogdan
    Bernad, Robert Leonard
    Carauleanu, Alexandru
    Socolov, Demetra
    Melinte-Popescu, Alina-Sinziana
    Vicoveanu, Petronela
    Harabor, Valeriu
    Mihalceanu, Elena
    Melinte-Popescu, Marian
    Harabor, Anamaria
    Bernad, Elena
    Nemescu, Dragos
    DIAGNOSTICS, 2024, 14 (04)
  • [38] Comprehensive assessment of machine learning-based methods for predicting antimicrobial peptides
    Xu, Jing
    Li, Fuyi
    Leier, Andre
    Xiang, Dongxu
    Shen, Hsin-Hui
    Lago, Tatiana T. Marquez
    Li, Jian
    Yu, Dong-Jun
    Song, Jiangning
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [39] Air Pollution Monitoring Using Cost-Effective Devices Enhanced by Machine Learning
    Colleaux, Yanis
    Willaume, Cedric
    Mohandes, Bijan
    Nebel, Jean-Christophe
    Rahman, Farzana
    SENSORS, 2025, 25 (05)
  • [40] A supervised machine learning model to select a cost-effective directional drilling tool
    Nour, Muhammad
    Elsayed, Said K.
    Mahmoud, Omar
    SCIENTIFIC REPORTS, 2024, 14 (01):