Revisiting the negative example sampling problem for predicting protein-protein interactions

被引:55
作者
Park, Yungki [1 ]
Marcotte, Edward M. [1 ]
机构
[1] Univ Texas Austin, Inst Cellular & Mol Biol, Ctr Syst & Synthet Biol, Austin, TX 78712 USA
基金
美国国家卫生研究院;
关键词
DATABASE; MAP;
D O I
10.1093/bioinformatics/btr514
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A number of computational methods have been proposed that predict protein-protein interactions (PPIs) based on protein sequence features. Since the number of potential non-interacting protein pairs ( negative PPIs) is very high both in absolute terms and in comparison to that of interacting protein pairs ( positive PPIs), computational prediction methods rely upon subsets of negative PPIs for training and validation. Hence, the need arises for subset sampling for negative PPIs. Results: We clarify that there are two fundamentally different types of subset sampling for negative PPIs. One is subset sampling for cross-validated testing, where one desires unbiased subsets so that predictive performance estimated with them can be safely assumed to generalize to the population level. The other is subset sampling for training, where one desires the subsets that best train predictive algorithms, even if these subsets are biased. We show that confusion between these two fundamentally different types of subset sampling led one study recently published in Bioinformatics to the erroneous conclusion that predictive algorithms based on protein sequence features are hardly better than random in predicting PPIs. Rather, both protein sequence features and the 'hubbiness' of interacting proteins contribute to effective prediction of PPIs. We provide guidance for appropriate use of random versus balanced sampling.
引用
收藏
页码:3024 / 3028
页数:5
相关论文
共 28 条
  • [1] Choosing negative examples for the prediction of protein-protein interactions
    Ben-Hur, A
    Noble, WS
    [J]. BMC BIOINFORMATICS, 2006, 7 (Suppl 1)
  • [2] Kernel methods for predicting protein-protein interactions
    Ben-Hur, A
    Noble, WS
    [J]. BIOINFORMATICS, 2005, 21 : I38 - I46
  • [3] Predicting protein-protein interactions from primary structure
    Bock, JR
    Gough, DA
    [J]. BIOINFORMATICS, 2001, 17 (05) : 455 - 460
  • [4] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [5] Predicting protein-protein interactions from sequences in a hybridization space
    Chou, KC
    Cai, YD
    [J]. JOURNAL OF PROTEOME RESEARCH, 2006, 5 (02) : 316 - 322
  • [6] Proteome survey reveals modularity of the yeast cell machinery
    Gavin, AC
    Aloy, P
    Grandi, P
    Krause, R
    Boesche, M
    Marzioch, M
    Rau, C
    Jensen, LJ
    Bastuck, S
    Dümpelfeld, B
    Edelmann, A
    Heurtier, MA
    Hoffman, V
    Hoefert, C
    Klein, K
    Hudak, M
    Michon, AM
    Schelder, M
    Schirle, M
    Remor, M
    Rudi, T
    Hooper, S
    Bauer, A
    Bouwmeester, T
    Casari, G
    Drewes, G
    Neubauer, G
    Rick, JM
    Kuster, B
    Bork, P
    Russell, RB
    Superti-Furga, G
    [J]. NATURE, 2006, 440 (7084) : 631 - 636
  • [7] Learning to predict protein-protein interactions from protein sequences
    Gomez, SM
    Noble, WS
    Rzhetsky, A
    [J]. BIOINFORMATICS, 2003, 19 (15) : 1875 - 1881
  • [8] Using support vector machine combined with auto covariance to predict proteinprotein interactions from protein sequences
    Guo, Yanzhi
    Yu, Lezheng
    Wen, Zhining
    Li, Menglong
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (09) : 3025 - 3030
  • [9] Measuring classifier performance: a coherent alternative to the area under the ROC curve
    Hand, David J.
    [J]. MACHINE LEARNING, 2009, 77 (01) : 103 - 123
  • [10] A comprehensive two-hybrid analysis to explore the yeast protein interactome
    Ito, T
    Chiba, T
    Ozawa, R
    Yoshida, M
    Hattori, M
    Sakaki, Y
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (08) : 4569 - 4574