XGBoost framework with feature selection for the prediction of RNA N5-methylcytosine sites

被引:27
作者
Abbas, Zeeshan [1 ]
Rehman, Mobeen ur [1 ]
Tayara, Hilal [2 ]
Zou, Quan [3 ]
Chong, Kil To [1 ,4 ]
机构
[1] Jeonbuk Natl Univ, Dept Elect & Informat Engn, Jeonju 54896, South Korea
[2] Jeonbuk Natl Univ, Sch Int Engn & Sci, Jeonju 54896, South Korea
[3] Univ Elect Sci & Technol China, Inst Fundamental & Frontier Sci, Chengdu 610054, Peoples R China
[4] Jeonbuk Natl Univ, Adv Elect & Informat Res Ctr, Jeonju 54896, South Korea
基金
新加坡国家研究基金会;
关键词
5-METHYLCYTOSINE SITES; MESSENGER-RNA; METHYLATION; METHYLTRANSFERASE; DATABASE;
D O I
10.1016/j.ymthe.2023.05.016
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
5-methylcytosine (m5C) is indeed a critical post-transcrip-tional alteration that is widely present in various kinds of RNAs and is crucial to the fundamental biological processes. By correctly identifying the m5C-methylation sites on RNA, clinicians can more clearly comprehend the precise function of these m5C-sites in different biological processes. Due to their effectiveness and affordability, computational methods have received greater attention over the last few years for the identification of methylation sites in various species. To pre-cisely identify RNA m5C locations in five different species including Homo sapiens, Arabidopsis thaliana, Mus musculus, Drosophila melanogaster, and Danio rerio, we proposed a more effective and accurate model named m5C-pred. To create m5C-pred, five distinct feature encoding techniques were com-bined to extract features from the RNA sequence, and then we used SHapley Additive exPlanations to choose the best features among them, followed by XGBoost as a classifier. We applied the novel optimization method called Optuna to quickly and efficiently determine the best hyperparameters. Finally, the proposed model was evaluated using independent test datasets, and we compared the results with the previous methods. Our approach, m5C-pred, is anticipated to be useful for accurately identifying m5C sites, outperforming the currently available state-of-the-art techniques.
引用
收藏
页码:2543 / 2551
页数:9
相关论文
共 34 条
[1]   Optuna: A Next-generation Hyperparameter Optimization Framework [J].
Akiba, Takuya ;
Sano, Shotaro ;
Yanase, Toshihiko ;
Ohta, Takeru ;
Koyama, Masanori .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2623-2631
[2]   MODOMICS: a database of RNA modification pathways. 2017 update [J].
Boccaletto, Pietro ;
Machnicka, Magdalena A. ;
Purta, Elzbieta ;
Piatkowski, Pawe ;
Baginski, Blazej ;
Wirecki, Tomasz K. ;
de Crecy-Lagard, Valerie ;
Ross, Robert ;
Limbach, Patrick A. ;
Kotter, Annika ;
Helm, Mark ;
Bujnicki, Janusz M. .
NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) :D303-D307
[3]   Staem5: A novel computational approach for accurate prediction of m5C site [J].
Chai, Di ;
Jia, Cangzhi ;
Zheng, Jia ;
Zou, Quan ;
Li, Fuyi .
MOLECULAR THERAPY NUCLEIC ACIDS, 2021, 26 :1027-1034
[4]   m5CPred-SVM: a novel method for predicting m5C sites of RNA [J].
Chen, Xiao ;
Xiong, Yi ;
Liu, Yinbo ;
Chen, Yuqing ;
Bi, Shoudong ;
Zhu, Xiaolei .
BMC BIOINFORMATICS, 2020, 21 (01)
[5]   METHYLATION STATE OF POLY A-CONTAINING MESSENGER RNA FROM CULTURED HAMSTER CELLS [J].
DUBIN, DT ;
TAYLOR, RH .
NUCLEIC ACIDS RESEARCH, 1975, 2 (10) :1653-1668
[6]   Transcriptome-Wide Mapping of 5-methylcytidine RNA Modifications in Bacteria, Archaea, and Yeast Reveals m5C within Archaeal mRNAs [J].
Edelheit, Sarit ;
Schwartz, Schraga ;
Mumbach, Maxwell R. ;
Wurtzel, Omri ;
Sorek, Rotem .
PLOS GENETICS, 2013, 9 (06)
[7]   Identifying RNA 5-methylcytosine sites via pseudo nucleotide compositions [J].
Feng, Pengmian ;
Ding, Hui ;
Chen, Wei ;
Lin, Hao .
MOLECULAR BIOSYSTEMS, 2016, 12 (11) :3307-3311
[8]   RNA modifications modulate gene expression during development [J].
Frye, Michaela ;
Harada, Bryan T. ;
Behm, Mikaela ;
He, Chuan .
SCIENCE, 2018, 361 (6409) :1346-1349
[9]   CD-HIT: accelerated for clustering the next-generation sequencing data [J].
Fu, Limin ;
Niu, Beifang ;
Zhu, Zhengwei ;
Wu, Sitao ;
Li, Weizhong .
BIOINFORMATICS, 2012, 28 (23) :3150-3152
[10]   Computational methods for RNA modification detection from nanopore direct RNA sequencing data [J].
Furlan, Mattia ;
Delgado-Tejedor, Anna ;
Mulroney, Logan ;
Pelizzola, Mattia ;
Novoa, Eva Maria ;
Leonardi, Tommaso .
RNA BIOLOGY, 2021, 18 :31-40