PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations

被引:10
|
作者
Auliah, Firda Nurul [1 ]
Nilamyani, Andi Nur [1 ]
Shoombuatong, Watshara [2 ]
Alam, Md Ashad [3 ]
Hasan, Md Mehedi [1 ,4 ]
Kurata, Hiroyuki [1 ]
机构
[1] Kyushu Inst Technol, Dept Biosci & Bioinformat, 680-4 Kawazu, Iizuka, Fukuoka 8208502, Japan
[2] Mahidol Univ, Fac Med Technol, Ctr Data Min & Biomed Informat, Bangkok 10700, Thailand
[3] Tulane Univ, Tulane Ctr Biomed Informat & Genom, Div Biomed Informat & Genom, John W Deming Dept Med,Sch Med, New Orleans, LA 70112 USA
[4] Japan Soc Promot Sci, Chiyoda Ku, 5-3-1 Kojimachi, Tokyo 1020083, Japan
基金
日本学术振兴会;
关键词
pupylation; feature encoding; chi-squared; machine learning; BIOINFORMATICS TOOLS; IDENTIFICATION; DATABASE; DOP;
D O I
10.3390/ijms22042120
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [21] The prediction of protein contacts from multiple sequence alignments
    Thomas, DJ
    Casari, G
    Sander, C
    PROTEIN ENGINEERING, 1996, 9 (11): : 941 - 948
  • [23] IRC-Fuse: improved and robust prediction of redox-sensitive cysteine by fusing of multiple feature representations
    Md Mehedi Hasan
    Md Ashad Alam
    Watshara Shoombuatong
    Hiroyuki Kurata
    Journal of Computer-Aided Molecular Design, 2021, 35 : 315 - 323
  • [24] Core column prediction for protein multiple sequence alignments
    DeBlasio, Dan
    Kececioglu, John
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2017, 12
  • [25] Core column prediction for protein multiple sequence alignments
    Dan DeBlasio
    John Kececioglu
    Algorithms for Molecular Biology, 12
  • [26] IRC-Fuse: improved and robust prediction of redox-sensitive cysteine by fusing of multiple feature representations
    Hasan, Md Mehedi
    Alam, Md Ashad
    Shoombuatong, Watshara
    Kurata, Hiroyuki
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2021, 35 (03) : 315 - 323
  • [27] Prediction of solvent accessibility and sites of deleterious mutations from protein sequence
    Chen, HL
    Zhou, HX
    NUCLEIC ACIDS RESEARCH, 2005, 33 (10) : 3193 - 3199
  • [28] Sequence-based prediction of protein interaction sites with an integrative method
    Chen, Xue-Wen
    Jeong, Jong Cheol
    BIOINFORMATICS, 2009, 25 (05) : 585 - 591
  • [29] Sequence and structure-based prediction of eukaryotic protein phosphorylation sites
    Blom, N
    Gammeltoft, S
    Brunak, S
    JOURNAL OF MOLECULAR BIOLOGY, 1999, 294 (05) : 1351 - 1362
  • [30] Prediction of protein hydration sites from sequence by modular neural networks
    Ehrlich, L
    Reczko, M
    Bohr, H
    Wade, RC
    PROTEIN ENGINEERING, 1998, 11 (01): : 11 - 19