Effect of Reference Genome Selection on the Performance of Computational Methods for Genome-Wide Protein-Protein Interaction Prediction

被引:18
作者
Muley, Vijaykumar Yogesh [1 ,2 ]
Ranjan, Akash [1 ]
机构
[1] Ctr DNA Fingerprinting & Diagnost, Computat & Funct Genom Grp, Hyderabad, Andhra Pradesh, India
[2] Dr Babasaheb Ambedkar Marathwada Univ, Dept Biotechnol, Subctr, Osmanabad, Maharashtra, India
来源
PLOS ONE | 2012年 / 7卷 / 07期
关键词
ESCHERICHIA-COLI; FUNCTIONAL LINKAGES; PHYLOGENETIC PROFILES; CONTEXT METHODS; GENE ORDER; NETWORKS; DATABASE; COEVOLUTION; EVOLUTION; CONSERVATION;
D O I
10.1371/journal.pone.0042057
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. Methods: We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Conclusions: Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Current progress on the computational methods for prediction of host-pathogen protein-protein interaction in the Ganoderma boninense-oil palm pathosystem
    Khairi, Mohamad Hazwan Fikri
    Muhammad, Nor Azlan Nor
    Bunawan, Hamidun
    Daud, Kauthar Mohd
    Sulaiman, Suhaila
    Mohamed-Hussein, Zeti-Azura
    Wong, Mui-Yun
    Ramzi, Ahmad Bazli
    PHYSIOLOGICAL AND MOLECULAR PLANT PATHOLOGY, 2024, 129
  • [32] Boosting Prediction Performance of Protein-Protein Interaction Hot Spots by Using Structural Neighborhood Properties
    Deng, Lei
    Guan, Jihong
    Wei, Xiaoming
    Yi, Yuan
    Zhang, Qiangfeng Cliff
    Zhou, Shuigeng
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (11) : 878 - 891
  • [33] Genome-wide screening and identification of long noncoding RNAs and their interaction with protein coding RNAs in bladder urothelial cell carcinoma
    Wang, Longxin
    Fu, Dian
    Qiu, Yongbin
    Xing, Xiaoxiao
    Xu, Feng
    Han, Conghui
    Xu, Xiaofeng
    Wei, Zhifeng
    Zhang, Zhengyu
    Ge, Jingping
    Cheng, Wen
    Xie, Hai-Long
    CANCER LETTERS, 2014, 349 (01) : 77 - 86
  • [34] Genome-wide pathway analysis implicates intracellular transmembrane protein transport in Alzheimer disease
    Hong, Mun-Gwan
    Alexeyenko, Andrey
    Lambert, Jean-Charles
    Amouyel, Philippe
    Prince, Jonathan A.
    JOURNAL OF HUMAN GENETICS, 2010, 55 (10) : 707 - 709
  • [35] SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale
    Nepusz, Tamas
    Sasidharan, Rajkumar
    Paccanaro, Alberto
    BMC BIOINFORMATICS, 2010, 11
  • [36] Genome-wide identification of major protein families of cyanobacteria and genomic insight into the circadian rhythm
    Mohanta, Tapan Kumar
    Pudake, Ramesh N.
    Bae, Hanhong
    EUROPEAN JOURNAL OF PHYCOLOGY, 2017, 52 (02) : 149 - 165
  • [37] Genome-wide screening for virulent candidate secreted effector protein macromolecules in Magnaporthe oryzae
    Liu, Jiazong
    Dong, Hongyang
    Wang, Yi
    Liu, Chunyan
    Wang, Ziming
    Xu, Qiyue
    Li, Wendi
    Zheng, Yuxiu
    Liang, Suochen
    Zhao, Haipeng
    Li, Yang
    Yin, Ziyi
    Ding, Xinhua
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2025, 304
  • [38] Genome-Wide CRISPRi Screening of Key Genes for Recombinant Protein Expression in Bacillus Subtilis
    Zhu, Xuyang
    Luo, Hui
    Yu, Xinrui
    Lv, Huihui
    Su, Lingqia
    Zhang, Kang
    Wu, Jing
    ADVANCED SCIENCE, 2024,
  • [39] Genome-wide association for major depressive disorder: a possible role for the presynaptic protein piccolo
    Sullivan, P. F.
    de Geus, E. J. C.
    Willemsen, G.
    James, M. R.
    Smit, J. H.
    Zandbelt, T.
    Arolt, V.
    Baune, B. T.
    Blackwood, D.
    Cichon, S.
    Coventry, W. L.
    Domschke, K.
    Farmer, A.
    Fava, M.
    Gordon, S. D.
    He, Q.
    Heath, A. C.
    Heutink, P.
    Holsboer, F.
    Hoogendijk, W. J.
    Hottenga, J. J.
    Hu, Y.
    Kohli, M.
    Lin, D.
    Lucae, S.
    MacIntyre, D. J.
    Maier, W.
    McGhee, K. A.
    McGuffin, P.
    Montgomery, G. W.
    Muir, W. J.
    Nolen, W. A.
    Noethen, M. M.
    Perlis, R. H.
    Pirlo, K.
    Posthuma, D.
    Rietschel, M.
    Rizzu, P.
    Schosser, A.
    Smit, A. B.
    Smoller, J. W.
    Tzeng, J-Y
    van Dyck, R.
    Verhage, M.
    Zitman, F. G.
    Martin, N. G.
    Wray, N. R.
    Boomsma, D. I.
    Penninx, B. W. J. H.
    MOLECULAR PSYCHIATRY, 2009, 14 (04) : 359 - 375
  • [40] Protein-protein interaction prediction methods: from docking-based to AI-based approaches
    Tsuchiya, Yuko
    Yamamori, Yu
    Tomii, Kentaro
    BIOPHYSICAL REVIEWS, 2022, 14 (06) : 1341 - 1348