Predicting Protein-Protein Interaction Sites Using Sequence Descriptors and Site Propensity of Neighboring Amino Acids

被引:11
作者
Kuo, Tzu-Hao [1 ]
Li, Kuo-Bin [1 ,2 ]
机构
[1] Natl Yang Ming Univ, Inst Biomed Informat, Taipei 112, Taiwan
[2] Natl Yang Ming Univ Hosp, Off Informat Management, Yilan 260, Taiwan
关键词
Protein-Protein Interaction; intrinsically-disorder protein; machine learning algorithms; PSEUDO NUCLEOTIDE COMPOSITION; LYSINE SUCCINYLATION SITES; SUPPORT VECTOR MACHINES; PHYSICOCHEMICAL PROPERTIES; EVOLUTIONARY CONSERVATION; STATISTICAL-ANALYSIS; ENSEMBLE CLASSIFIER; GENERAL-FORM; PSI-BLAST; DATABASE;
D O I
10.3390/ijms17111788
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Information about the interface sites of Protein-Protein Interactions (PPIs) is useful for many biological research works. However, despite the advancement of experimental techniques, the identification of PPI sites still remains as a challenging task. Using a statistical learning technique, we proposed a computational tool for predicting PPI interaction sites. As an alternative to similar approaches requiring structural information, the proposed method takes all of the input from protein sequences. In addition to typical sequence features, our method takes into consideration that interaction sites are not randomly distributed over the protein sequence. We characterized this positional preference using protein complexes with known structures, proposed a numerical index to estimate the propensity and then incorporated the index into a learning system. The resulting predictor, without using structural information, yields an area under the ROC curve (AUC) of 0.675, recall of 0.597, precision of 0.311 and accuracy of 0.583 on a ten-fold cross-validation experiment. This performance is comparable to the previous approach in which structural information was used. Upon introducing the B-factor data to our predictor, we demonstrated that the AUC can be further improved to 0.750. The tool is accessible at http://bsaltools.ym.edu.tw/predppis.
引用
收藏
页数:18
相关论文
共 96 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] [Anonymous], 2012, International Journal of Emerging Technology and Advanced Engineering, DOI DOI 10.46338/IJETAE0412_13
  • [3] [Anonymous], 2011, ACM T INTEL SYST TEC, DOI DOI 10.1145/1961189.1961199
  • [4] [Anonymous], ONCOTARGET
  • [5] Statistical analysis of predominantly transient protein-protein interfaces
    Ansari, S
    Helms, V
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 61 (02) : 344 - 355
  • [6] ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids
    Ashkenazy, Haim
    Erez, Elana
    Martz, Eric
    Pupko, Tal
    Ben-Tal, Nir
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : W529 - W533
  • [7] Solving the protein sequence metric problem
    Atchley, WR
    Zhao, JP
    Fernandes, AD
    Drüke, T
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (18) : 6395 - 6400
  • [8] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [9] Inferring interaction partners from protein sequences
    Bitbol, Anne-Florence
    Dwyer, Robert S.
    Colwell, Lucy J.
    Wingreen, Ned S.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (43) : 12180 - 12185
  • [10] Statistical analysis and prediction of protein-protein interfaces
    Bordner, AJ
    Abagyan, R
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 60 (03) : 353 - 366