PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations

被引：10

作者：

Auliah, Firda Nurul ^{[1
]}

Nilamyani, Andi Nur ^{[1
]}

Shoombuatong, Watshara ^{[2
]}

Alam, Md Ashad ^{[3
]}

Hasan, Md Mehedi ^{[1
,4
]}

Kurata, Hiroyuki ^{[1
]}

机构：

[1] Kyushu Inst Technol, Dept Biosci & Bioinformat, 680-4 Kawazu, Iizuka, Fukuoka 8208502, Japan

[2] Mahidol Univ, Fac Med Technol, Ctr Data Min & Biomed Informat, Bangkok 10700, Thailand

[3] Tulane Univ, Tulane Ctr Biomed Informat & Genom, Div Biomed Informat & Genom, John W Deming Dept Med,Sch Med, New Orleans, LA 70112 USA

[4] Japan Soc Promot Sci, Chiyoda Ku, 5-3-1 Kojimachi, Tokyo 1020083, Japan

来源：

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES | 2021年 / 22卷 / 04期

基金：

日本学术振兴会;

关键词：

pupylation; feature encoding; chi-squared; machine learning; BIOINFORMATICS TOOLS; IDENTIFICATION; DATABASE; DOP;

D O I：

10.3390/ijms22042120

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available.

引用

页码：1 / 12

页数：12

共 50 条

[21] The prediction of protein contacts from multiple sequence alignments
Thomas, DJ
Casari, G
Sander, C
PROTEIN ENGINEERING, 1996, 9 (11): : 941 - 948
[22] The prediction of protein contacts from multiple sequence alignments
Protein Eng, 11 (941):
[23] IRC-Fuse: improved and robust prediction of redox-sensitive cysteine by fusing of multiple feature representations
Md Mehedi Hasan
Md Ashad Alam
Watshara Shoombuatong
Hiroyuki Kurata
Journal of Computer-Aided Molecular Design, 2021, 35 : 315 - 323
[24] Core column prediction for protein multiple sequence alignments
DeBlasio, Dan
Kececioglu, John
ALGORITHMS FOR MOLECULAR BIOLOGY, 2017, 12
[25] Core column prediction for protein multiple sequence alignments
Dan DeBlasio
John Kececioglu
Algorithms for Molecular Biology, 12
[26] IRC-Fuse: improved and robust prediction of redox-sensitive cysteine by fusing of multiple feature representations
Hasan, Md Mehedi
Alam, Md Ashad
Shoombuatong, Watshara
Kurata, Hiroyuki
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2021, 35 (03) : 315 - 323
[27] Prediction of solvent accessibility and sites of deleterious mutations from protein sequence
Chen, HL
Zhou, HX
NUCLEIC ACIDS RESEARCH, 2005, 33 (10) : 3193 - 3199
[28] Sequence-based prediction of protein interaction sites with an integrative method
Chen, Xue-Wen
Jeong, Jong Cheol
BIOINFORMATICS, 2009, 25 (05) : 585 - 591
[29] Sequence and structure-based prediction of eukaryotic protein phosphorylation sites
Blom, N
Gammeltoft, S
Brunak, S
JOURNAL OF MOLECULAR BIOLOGY, 1999, 294 (05) : 1351 - 1362
[30] Prediction of protein hydration sites from sequence by modular neural networks
Ehrlich, L
Reczko, M
Bohr, H
Wade, RC
PROTEIN ENGINEERING, 1998, 11 (01): : 11 - 19

← 1 2 3 4 5 →