Porpoise: a new approach for accurate prediction of RNA pseudouridine sites

被引：44

作者：

Li, Fuyi ^{[1
]}

Guo, Xudong ^{[2
]}

Jin, Peipei ^{[3
]}

Chen, Jinxiang ^{[4
]}

Xiang, Dongxu ^{[5
]}

Song, Jiangning ^{[6
,7
]}

Coin, Lachlan J. M. ^{[8
,9
]}

机构：

[1] Univ Melbourne, Peter Doherty Inst Infect & Immun, Dept Microbiol & Immunol, 792 Elizabeth St, Melbourne, Vic 3000, Australia

[2] Ningxia Univ, Yinchuan, Ningxia, Peoples R China

[3] Shanghai Jiao Tong Univ, Dept Clin Lab, Ruijin Hosp, Sch Med, Shanghai, Peoples R China

[4] Northwest A&F Univ, Xianyang, Peoples R China

[5] Univ Melbourne, Fac Engn & Informat Technol, Melbourne, Vic, Australia

[6] Monash Univ, Monash Biomed Discovery Inst, Melbourne, Vic, Australia

[7] Monash Univ, Monash Data Futures Inst, Melbourne, Vic, Australia

[8] Univ Melbourne, Dept Microbiol & Immunol, Melbourne, Vic, Australia

[9] Univ Melbourne, Dept Clin Pathol, Melbourne, Vic, Australia

来源：

BRIEFINGS IN BIOINFORMATICS | 2021年 / 22卷 / 06期

基金：

澳大利亚国家健康与医学研究理事会; 澳大利亚研究理事会; 英国医学研究理事会; 美国国家卫生研究院;

关键词：

RNA pseudouridine sit; ebioinformatics; sequence analysis; machine learning; stacking ensemble learning; YEAST; MODEL;

D O I：

10.1093/bib/bbab245

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Pseudouridine is a ubiquitous RNA modification type present in eukaryotes and prokaryotes, which plays a vital role in various biological processes. Almost all kinds of RNAs are subject to this modification. However, it remains a great challenge to identify pseudouridine sites via experimental approaches, requiring expensive and time-consuming experimental research. Therefore, computational approaches that can be used to perform accurate in silico identification of pseudouridine sites from the large amount of RNA sequence data are highly desirable and can aid in the functional elucidation of this critical modification. Here, we propose a new computational approach, termed Porpoise, to accurately identify pseudouridine sites from RNA sequence data. Porpoise builds upon a comprehensive evaluation of 18 frequently used feature encoding schemes based on the selection of four types of features, including binary features, pseudo k-tuple composition, nucleotide chemical property and position-specific trinucleotide propensity based on single-strand (PSTNPss). The selected features are fed into the stacked ensemble learning framework to enable the construction of an effective stacked model. Both cross-validation tests on the benchmark dataset and independent tests show that Porpoise achieves superior predictive performance than several state-of-the-art approaches. The application of model interpretation tools demonstrates the importance of PSTNPs for the performance of the trained models. This new method is anticipated to facilitate community-wide efforts to identify putative pseudouridine sites and formulate novel testable biological hypothesis.

引用

页数：12

共 42 条

[1] A Pseudouridine Residue in the Spliceosome Core Is Part of the Filamentous Growth Program in Yeast
Basak, Anindita
Query, Charles C.
[J]. CELL REPORTS, 2014, 8 (04): : 966 - 973
[2] An Interpretable Prediction Model for Identifying N7-Methylguanosine Sites Based on XGBoost and SHAP
Bi, Yue
Xiang, Dongxu
Ge, Zongyuan
Li, Fuyi
Jia, Cangzhi
Song, Jiangning
[J]. MOLECULAR THERAPY-NUCLEIC ACIDS, 2020, 22 : 362 - 372
[3] EnsemPseU: Identifying Pseudouridine Sites With an Ensemble Approach
Bi, Yue
Jin, Dong
Jia, Cangzhi
[J]. IEEE ACCESS, 2020, 8 : 79376 - 79382
[4] Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells
Carlile, Thomas M.
Rojas-Duran, Maria F.
Zinshteyn, Boris
Shin, Hakyung
Bartoli, Kristen M.
Gilbert, Wendy V.
[J]. NATURE, 2014, 515 (7525) : 143 - +
[5] Charette M, 2000, IUBMB LIFE, V49, P341
[6] WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach
Chen, Kunqi
Wei, Zhen
Zhang, Qing
Wu, Xiangyu
Rong, Rong
Lu, Zhiliang
Su, Jionglong
de Magalhaes, Joao Pedro
Rigden, Daniel J.
Meng, Jia
[J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (07)
[7] Chen T., 2015, R package version 0.4-2, V1, P1, DOI DOI 10.1145/2939672.2939785
[8] iRNA-PseU: Identifying RNA pseudouridine sites
Chen, Wei
Tang, Hua
Ye, Jing
Lin, Hao
Chou, Kuo-Chen
[J]. MOLECULAR THERAPY-NUCLEIC ACIDS, 2016, 5 : e332
[9] iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization
Chen, Zhen
Zhao, Pei
Li, Chen
Li, Fuyi
Xiang, Dongxu
Chen, Yong-Zi
Akutsu, Tatsuya
Daly, Roger J.
Webb, Geoffrey, I
Zhao, Quanzhi
Kurgan, Lukasz
Song, Jiangning
[J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (10)
[10] iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data
Chen, Zhen
Zhao, Pei
Li, Fuyi
Marquez-Lago, Tatiana T.
Leier, Andre
Revote, Jerico
Zhu, Yan
Powell, David R.
Akutsu, Tatsuya
Webb, Geoffrey, I
Chou, Kuo-Chen
Smith, A. Ian
Daly, Roger J.
Li, Jian
Song, Jiangning
[J]. BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) : 1047 - 1057

← 1 2 3 4 5 →