Domain-knowledge enabled ensemble learning of 5-formylcytosine (f5C) modification sites

被引:0
|
作者
Huang, Jiaming [1 ,3 ]
Wang, Xuan [3 ]
Xia, Rong [4 ]
Yang, Dongqing [2 ]
Liu, Jian [1 ]
Lv, Qi [1 ]
Yue, Xiaoxuan [5 ]
Meng, Jia [3 ,5 ,6 ,7 ]
Chen, Kunqi [8 ]
Song, Bowen [2 ]
Wang, Yue [1 ]
机构
[1] Nanjing Univ Chinese Med, Sch Pharm, Jiangsu Key Lab Funct Subst Chinese Med, Nanjing 210023, Peoples R China
[2] Nanjing Univ Chinese Med, Sch Med, Dept Publ Hlth, Nanjing 210023, Peoples R China
[3] Xian Jiaotong Liverpool Univ, Sch Sci, Dept Biol Sci, Suzhou 215123, Peoples R China
[4] Xian Jiaotong Liverpool Univ, Sch AI & Adv Comp, Suzhou 215123, Peoples R China
[5] Nanjing Univ Chinese Med, Sch Med, Dept Pharmacol, Nanjing 210023, Peoples R China
[6] Xian Jiaotong Liverpool Univ, AI Univ Res Ctr, Suzhou 215123, Peoples R China
[7] Univ Liverpool, Inst Syst Mol & Integrat Biol, Liverpool L7 8TX, England
[8] Fujian Med Univ, Sch Basic Med Sci, Key Lab Minist Educ Gastrointestinal Canc, Fuzhou 350004, Peoples R China
基金
中国国家自然科学基金;
关键词
RNA modification; Ensemble learning; 5-formylcytidine; Epitranscriptomic marks; Genomic features; RNA 5-METHYLCYTOSINE SITES; WEB SERVER; N-6-METHYLADENOSINE SITES; N6-METHYLADENOSINE SITES; M(6)A SITES; IDENTIFICATION; MODOMICS; PREDICTION; DATABASE; NSUN3;
D O I
10.1016/j.csbj.2024.08.004
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
5-formylcytidine (f5C) is a unique post-transcriptional RNA modification found in mRNA and tRNA at the wobble site, playing a crucial role in mitochondrial protein synthesis and potentially contributing to the regulation of translation. Recent studies have unveiled that the f5C modifications may drive mitochondrial mRNA translation to power cancer metastasis. Accurate identification of f5C sites is essential for further unraveling their molecular functions and regulatory mechanisms, but there are currently no computational methods available for predicting their locations. In this study, we introduce an innovative ensemble approach, successfully enabling the computational recognition of Saccharomyces cerevisiae f5C. We conducted a comprehensive model selection process that involved multiple basic machine learning and deep learning algorithms such as recurrent neural networks, convolutional neural networks and Transformer-based models. Initially trained only on sequence information, these individual models achieved an AUROC ranging from 0.7104 to 0.7492. Through the integration of 32 novel domain-derived genomic features, the performance of individual models has significantly improved to an AUROC between 0.7309 and 0.8076. To further enhance accuracy and robustness, we then constructed the ensembles of these individual models with different combinations. The best performance attained by our ensemble models reached an AUROC of 0.8391. Shapley additive explanations were conducted to explain the significant contributions of genomic features, providing insights into the putative distribution of f5C across various topological regions and potentially paving the way for revealing their functional relevance within distinct genomic contexts. A freely accessible web server that allows real-time analysis of user-uploaded sites can be accessed at: www.rnamd.org/Resf5C-Pred.
引用
收藏
页码:3175 / 3185
页数:11
相关论文
共 2 条
  • [1] Recent Advance in the Study on 5-Formylcytosine (f5C) RNA Modification
    Wang, Xin
    Jin, Xiao-Yang
    Cheng, Liang
    ISRAEL JOURNAL OF CHEMISTRY, 2024, 64 (3-4)
  • [2] m5C-HPromoter: An Ensemble Deep Learning Predictor for Identifying 5-methylcytosine Sites in Human Promoters
    Xiao, Xuan
    Shao, Yu-Tao
    Luo, Zhen-Tao
    Qiu, Wang-Ren
    CURRENT BIOINFORMATICS, 2022, 17 (05) : 452 - 461