SPPPred: Sequence-Based Protein-Peptide Binding Residue Prediction Using Genetic Programming and Ensemble Learning

被引:4
作者
Shafiee, Shima [1 ]
Fathi, Abdolhossein [1 ]
Taherzadeh, Ghazaleh [2 ]
机构
[1] Razi Univ, Dept Comp Engn & Informat Technol, Kermanshah 6714414971, Iran
[2] Wilkes Univ, Dept Math & Comp Sci, Wilkes Barre, PA 18766 USA
关键词
Proteins; Feature extraction; Prediction algorithms; Classification algorithms; Support vector machines; Amino acids; Peptides; Binding residue prediction; ensemble learning; genetic programming; protein-peptide interaction; sequence-based; AMINO-ACID; SITES;
D O I
10.1109/TCBB.2022.3230540
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Peptide-binding proteins play significant roles in various applications such as gene expression, metabolism, signal transmission, DNA (Deoxyribose Nucleic Acid) repair, and replication. Investigating the binding residues in protein-peptide complexes, especially from their sequence only, is challenging experimentally and computationally. Although several computational approaches have been introduced to determine and predict these binding residues, there is still ample room to improve the prediction performance. In this work, we introduce a novel ensemble machine learning-based approach called SPPPred (Sequence-based Protein-Peptide binding residue Prediction) to predict protein-peptide binding residues. First, we extract relevant sequential information and employ genetic programming algorithm for feature construction to find more distinctive features. We then, in the next step, build an ensemble-based machine learning classifier to predict binding residues. The proposed method shows consistent and comparable performance on both ten-fold cross-validation and independent test set. Furthermore, SPPPred yields F-Measure (F-M), Accuracy(ACC), and Matthews' Correlation Coefficient (MCC) of 0.310, 0.949, and 0.230 on the independent test set, respectively, which outperforms other competing methods by approximately up to 9% on the independent test set. SPPPred is publicly available https://github.com/GTaherzadeh/SPPPred.git.
引用
收藏
页码:2029 / 2040
页数:12
相关论文
共 57 条
  • [1] Abdin O, 2021, bioRxiv, DOI [10.1101/2021.01.10.426132, 10.1101/2021.01.10.426132v2, DOI 10.1101/2021.01.10.426132V2]
  • [2] Abdin O., 2020, ADV NEURAL INFORM PR, V33
  • [3] STATISTICS NOTES - DIAGNOSTIC-TESTS-1 - SENSITIVITY AND SPECIFICITY .3.
    ALTMAN, DG
    BLAND, JM
    [J]. BRITISH MEDICAL JOURNAL, 1994, 308 (6943) : 1552 - 1552
  • [4] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [5] [Anonymous], 2019, PythonREG
  • [6] Machine Learning, P177, DOI [DOI 10.1002/9781119557500.CH8, 10.1002/9781119557500, DOI 10.1002/9781119557500, 10.1002/9781119557500.ch8]
  • [7] webPDBinder: a server for the identification of ligand binding sites on protein structures
    Bianchi, Valerio
    Mangone, Iolanda
    Ferre, Fabrizio
    Helmer-Citterich, Manuela
    Ausiello, Gabriele
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (W1) : W308 - W313
  • [8] The MPI Bioinformatics toolkit for protein sequence analysis
    Biegert, Andreas
    Mayer, Christian
    Remmert, Michael
    Soeding, Johannes
    Lupas, Andrei N.
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : W335 - W339
  • [9] Burke E, 2003, LECT NOTES COMPUT SC, V2724, P1800
  • [10] Camacho-Gomez C., 2021, Applied Optimization and Swarm Intelligence, P25, DOI [10.1007/978-981-16-0662-5_2, DOI 10.1007/978-981-16-0662-5_2]