m5U-SVM: identification of RNA 5-methyluridine modification sites based on multi-view features of physicochemical features and distributed representation

被引:53
作者
Ao, Chunyan [1 ,2 ,3 ]
Ye, Xiucai [2 ]
Sakurai, Tetsuya [2 ]
Zou, Quan [3 ,4 ]
Yu, Liang [1 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian, Peoples R China
[2] Univ Tsukuba, Dept Comp Sci, Tsukuba, Japan
[3] Univ Elect Sci & Technol China, Inst Fundamental & Frontier Sci, Chengdu, Peoples R China
[4] Univ Elect Sci & Technol China, Yangtze Delta Reg Inst Quzhou, Quzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
5-Methyluridine; Support vector machines; Multi-view feature; Word2Vec; WEB SERVER; METHYLTRANSFERASE; PREDICTION; LANDSCAPE; BINDING;
D O I
10.1186/s12915-023-01596-0
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
BackgroundRNA 5-methyluridine (m5U) modifications are obtained by methylation at the C-5 position of uridine catalyzed by pyrimidine methylation transferase, which is related to the development of human diseases. Accurate identification of m5U modification sites from RNA sequences can contribute to the understanding of their biological functions and the pathogenesis of related diseases. Compared to traditional experimental methods, computational methods developed based on machine learning with ease of use can identify modification sites from RNA sequences in an efficient and time-saving manner. Despite the good performance of these computational methods, there are some drawbacks and limitations.ResultsIn this study, we have developed a novel predictor, m5U-SVM, based on multi-view features and machine learning algorithms to construct predictive models for identifying m5U modification sites from RNA sequences. In this method, we used four traditional physicochemical features and distributed representation features. The optimized multi-view features were obtained from the four fused traditional physicochemical features by using the two-step LightGBM and IFS methods, and then the distributed representation features were fused with the optimized physicochemical features to obtain the new multi-view features. The best performing classifier, support vector machine, was identified by screening different machine learning algorithms. Compared with the results, the performance of the proposed model is better than that of the existing state-of-the-art tool.Conclusionsm5U-SVM provides an effective tool that successfully captures sequence-related attributes of modifications and can accurately predict m5U modification sites from RNA sequences. The identification of m5U modification sites helps to understand and delve into the related biological processes and functions.
引用
收藏
页数:14
相关论文
共 39 条
[1]  
Ao C, M5U SVM, DOI [10.5281/zenodo.7792512, DOI 10.5281/ZENODO.7792512]
[2]   Role of RNA modifications in cancer [J].
Barbieri, Isaia ;
Kouzarides, Tony .
NATURE REVIEWS CANCER, 2020, 20 (06) :303-322
[3]   Sequence-structure-function studies of tRNA:m5C methyltransferase Trm4p and its relationship to DNA:m5C and RNA:m5U methyltransferases [J].
Bujnicki, JM ;
Feder, M ;
Ayres, CL ;
Redman, KL .
NUCLEIC ACIDS RESEARCH, 2004, 32 (08) :2453-2463
[4]   Pseudo-Seq: Genome-Wide Detection of Pseudouridine Modifications in RNA [J].
Carlile, Thomas M. ;
Rojas-Duran, Maria F. ;
Gilbert, Wendy V. .
RNA MODIFICATION, 2015, 560 :219-245
[5]   FICC-Seq: a method for enzyme-specified profiling of methyl-5-uridine in cellular RNA [J].
Carter, Jean-Michel ;
Emmett, Warren ;
Mozos, Igor R. D. L. ;
Kotter, Annika ;
Helm, Mark ;
Ule, Jernej ;
Hussain, Shobbir .
NUCLEIC ACIDS RESEARCH, 2019, 47 (19) :E113-+
[6]   TRMT2A is a novel cell cycle regulator that suppresses cell proliferation [J].
Chang, Yu-Hsin ;
Nishimura, Susumu ;
Oishi, Hisashi ;
Kelly, Vincent P. ;
Kuno, Akihiro ;
Takahashi, Satoru .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2019, 508 (02) :410-415
[7]  
Chen T., 2015, R PACKAGE VERSION 04, V1, P1
[8]   i6mA-Pred: identifying DNA N6 - methyladenine sites in the rice genome [J].
Chen, Wei ;
Lv, Hao ;
Nie, Fulei ;
Lin, Hao .
BIOINFORMATICS, 2019, 35 (16) :2796-2800
[9]   iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition [J].
Chen, Wei ;
Feng, Peng-Mian ;
Lin, Hao ;
Chou, Kuo-Chen .
NUCLEIC ACIDS RESEARCH, 2013, 41 (06) :e68
[10]   iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization [J].
Chen, Zhen ;
Zhao, Pei ;
Li, Chen ;
Li, Fuyi ;
Xiang, Dongxu ;
Chen, Yong-Zi ;
Akutsu, Tatsuya ;
Daly, Roger J. ;
Webb, Geoffrey, I ;
Zhao, Quanzhi ;
Kurgan, Lukasz ;
Song, Jiangning .
NUCLEIC ACIDS RESEARCH, 2021, 49 (10)