Effect of Reference Genome Selection on the Performance of Computational Methods for Genome-Wide Protein-Protein Interaction Prediction

被引:18
作者
Muley, Vijaykumar Yogesh [1 ,2 ]
Ranjan, Akash [1 ]
机构
[1] Ctr DNA Fingerprinting & Diagnost, Computat & Funct Genom Grp, Hyderabad, Andhra Pradesh, India
[2] Dr Babasaheb Ambedkar Marathwada Univ, Dept Biotechnol, Subctr, Osmanabad, Maharashtra, India
来源
PLOS ONE | 2012年 / 7卷 / 07期
关键词
ESCHERICHIA-COLI; FUNCTIONAL LINKAGES; PHYLOGENETIC PROFILES; CONTEXT METHODS; GENE ORDER; NETWORKS; DATABASE; COEVOLUTION; EVOLUTION; CONSERVATION;
D O I
10.1371/journal.pone.0042057
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. Methods: We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Conclusions: Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Detecting reliable non interacting proteins (NIPs) significantly enhancing the computational prediction of protein-protein interactions using machine learning methods
    Srivastava, A.
    Mazzocco, G.
    Kel, A.
    Wyrwicz, L. S.
    Plewczynski, D.
    MOLECULAR BIOSYSTEMS, 2016, 12 (03) : 778 - 785
  • [42] Increased Genome Sampling Reveals a Dynamic Relationship between Gene Duplicability and the Structure of the Primate Protein-Protein Interaction Network
    Doherty, Aoife
    Alvarez-Ponce, David
    McInerney, James O.
    MOLECULAR BIOLOGY AND EVOLUTION, 2012, 29 (11) : 3563 - 3573
  • [43] A Quantitative Approach to Analyzing Genome Reductive Evolution Using Protein-Protein Interaction Networks: A Case Study of Mycobacterium leprae
    Akinola, Richard O.
    Mazandu, Gaston K.
    Mulder, Nicola J.
    FRONTIERS IN GENETICS, 2016, 7
  • [44] Genome-Wide Study of NOT2_3_5 Protein Subfamily in Cotton and Their Necessity in Resistance to Verticillium wilt
    Zhao, Pei
    Qin, Tengfei
    Chen, Wei
    Sang, Xiaohui
    Zhao, Yunlei
    Wang, Hongmei
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (11)
  • [45] Combining handcrafted and learned features using deep learning to improve protein-protein interaction prediction performance
    Nhan, Tran Hoai
    Quynh, Nguyen Phuc Xuan
    Phuong, Le Anh
    JOURNAL OF INFORMATION AND TELECOMMUNICATION, 2024,
  • [46] Comprehensive characterization of poplar HSP20 gene family: genome-wide identification, stress-induced expression profiling, and protein interaction verifications
    Shi, Lincui
    Kang, Yude
    Ding, Ling
    Xu, Liejia
    Liu, Xiaojiao
    Yu, Anmin
    Liu, Aizhong
    Li, Ping
    BMC PLANT BIOLOGY, 2025, 25 (01):
  • [47] Genome-wide survey and characterization of the small heat shock protein gene family in Bursaphelenchus xylophilus
    Wang, Feng
    Li, Danlei
    Chen, Qiaoli
    Ma, Ling
    GENE, 2016, 579 (02) : 153 - 161
  • [48] Genome-wide identification and characterization of nonspecific lipid transfer protein (nsLTP) genes in Arachis duranensis
    Song, Xiaojun
    Li, Enguang
    Song, Hui
    Du, Guoning
    Li, Shuai
    Zhu, Hong
    Chen, Guanxu
    Zhao, Chunmei
    Qiao, Lixian
    Wang, Jingshan
    Yu, Shanlin
    Sui, Jiong-ming
    GENOMICS, 2020, 112 (06) : 4332 - 4341
  • [49] Genome-wide analysis of the calcium-dependent protein kinase gene family in Gossypium raimondii
    Li Li-bei
    Yu Ding-wei
    Zhao Feng-li
    Pang Chao-you
    Song Mei-zhen
    Wei Heng-ling
    Fan Shu-li
    Yu Shu-xun
    JOURNAL OF INTEGRATIVE AGRICULTURE, 2015, 14 (01) : 29 - 41
  • [50] Genome-wide analysis to predict protein sequence variations that change phosphorylation sites or their corresponding kinases
    Ryu, Gil-Mi
    Song, Pamela
    Kim, Kyu-Won
    Oh, Kyung-Soo
    Park, Keun-Joon
    Kim, Jong Hun
    NUCLEIC ACIDS RESEARCH, 2009, 37 (04) : 1297 - 1307