TMSC-m7G: A transformer architecture based on multi-sense-scaled embedding features and convolutional neural network to identify RNA N7-methylguanosine sites

被引:8
作者
Zhang, Shengli [1 ,2 ]
Xu, Yujie [1 ]
Liang, Yunyun [3 ]
机构
[1] Xidian Univ, Sch Math & Stat, Xian 710071, Peoples R China
[2] Key Lab Computat Sci & Applicat Hainan Prov, Haikou 571158, Peoples R China
[3] Xian Polytech Univ, Sch Sci, Xian 710048, Peoples R China
基金
中国国家自然科学基金;
关键词
RNA N7-methylguanosine; Natural language processing; Word embedding; Transformer; Convolutional neural network; CAP STRUCTURE; CD-HIT; MODEL; IDENTIFICATION; REVEALS; PROTEIN; ROLES; CODE;
D O I
10.1016/j.csbj.2023.11.052
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
RNA N7-methylguanosine (m7G) is a crucial chemical modification of RNA molecules, whose principal duty is to maintain RNA function and protein translation. Studying and predicting RNA N7-methylguanosine sites aid in comprehending the biological function of RNA and the development of new drug therapy regimens. In the present scenario, the efficacy of techniques, specifically deep learning and machine learning, stands out in the prediction of RNA N7-methylguanosine sites, leading to improved accuracy and identification efficiency. In this study, we propose a model leveraging the transformer framework that integrates natural language processing and deep learning to predict m7G sites, called TMSC-m7G. In TMSC-m7G, a combination of multi-sense-scaled token embedding and fixed-position embedding is used to replace traditional word embedding for the extraction of contextual information from sequences. Moreover, a convolutional layer is added in the encoder to make up for the shortage of local information acquisition in transformer. The model's robustness and generalization are validated through 10-fold cross-validation and an independent dataset test. Results demonstrate outstanding performance in comparison to the most advanced models available. Among them, the Accuracy of TMSC-m7G reaches 98.70% and 92.92% on the benchmark dataset and independent dataset, respectively. To facilitate the popularization and use of the model, we have developed an intuitive online prediction tool, which is easily accessible for free at http://39.105.212.81/.
引用
收藏
页码:129 / 139
页数:11
相关论文
共 52 条
[1]   tRNA m7G methyltransferase Trm8p/Trm82p:: Evidence linking activity to a growth phenotype and implicating Trm82p in maintaining levels of active Trm8p [J].
Alexandrov, A ;
Grayhack, EJ ;
Phizicky, EM .
RNA, 2005, 11 (05) :821-830
[2]  
[Anonymous], 2014, arXiv
[3]   An Interpretable Prediction Model for Identifying N7-Methylguanosine Sites Based on XGBoost and SHAP [J].
Bi, Yue ;
Xiang, Dongxu ;
Ge, Zongyuan ;
Li, Fuyi ;
Jia, Cangzhi ;
Song, Jiangning .
MOLECULAR THERAPY-NUCLEIC ACIDS, 2020, 22 :362-372
[4]  
Cai MY, 2023, AM J CANCER RES, V13, P1640
[5]   iRNA-m7G: Identifying N7-methylguanosine Sites by Fusing Multiple Features [J].
Chen, Wei ;
Feng, Pengmian ;
Song, Xiaoming ;
Lv, Hao ;
Lin, Hao .
MOLECULAR THERAPY-NUCLEIC ACIDS, 2019, 18 :269-274
[6]   Iterative feature representation algorithm to improve the predictive performance of N7-methylguanosine sites [J].
Dai, Chichi ;
Feng, Pengmian ;
Cui, Lizhen ;
Su, Ran ;
Chen, Wei ;
Wei, Leyi .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)
[7]   Identification of mitochondrial proteins of malaria parasite using analysis of variance [J].
Ding, Hui ;
Li, Dongmei .
AMINO ACIDS, 2015, 47 (02) :329-333
[8]   THE EFFECT OF CAPPING AND POLYADENYLATION ON THE STABILITY, MOVEMENT AND TRANSLATION OF SYNTHETIC MESSENGER-RNAS IN XENOPUS OOCYTES [J].
DRUMMOND, DR ;
ARMSTRONG, J ;
COLMAN, A .
NUCLEIC ACIDS RESEARCH, 1985, 13 (20) :7375-7394
[9]   FINDING STRUCTURE IN TIME [J].
ELMAN, JL .
COGNITIVE SCIENCE, 1990, 14 (02) :179-211
[10]   AFP-MFL: accurate identification of antifungal peptides using multi-view feature learning [J].
Fang, Yitian ;
Xu, Fan ;
Wei, Lesong ;
Jiang, Yi ;
Chen, Jie ;
Wei, Leyi ;
Wei, Dong-Qing .
BRIEFINGS IN BIOINFORMATICS, 2023, 24 (01)