Proteinortho: Detection of (Co-)orthologs in large-scale analysis

被引:809
作者
Lechner, Marcus [1 ,2 ]
Findeiss, Sven [2 ,4 ]
Steiner, Lydia [2 ,3 ,4 ]
Marz, Manja [1 ]
Stadler, Peter F. [2 ,4 ,5 ,6 ,7 ,8 ,9 ]
Prohaska, Sonja J. [3 ,4 ]
机构
[1] Univ Marburg, RNA Bioinformat Grp, Dept Pharmaceut Chem, D-35037 Marburg, Germany
[2] Bioinformat Grp, Dept Comp Sci, D-04107 Leipzig, Germany
[3] Univ Leipzig, Dept Comp Sci, Bioinformat EvoDevo Grp, D-04107 Leipzig, Germany
[4] Univ Leipzig, Interdisciplinary Ctr Bioinformat, D-04107 Leipzig, Germany
[5] Max Planck Inst Math Sci, D-04103 Leipzig, Germany
[6] Fraunhofer Inst Cell Therapy & Immunol, D-04103 Leipzig, Germany
[7] Univ Vienna, Inst Theoret Chem, A-1090 Vienna, Austria
[8] Univ Copenhagen, Ctr Noncoding RNA Technol & Hlth, DK-1870 Frederiksberg, Denmark
[9] Santa Fe Inst, Santa Fe, NM 87501 USA
来源
BMC BIOINFORMATICS | 2011年 / 12卷
关键词
ORTHOLOGOUS GROUPS; GENOME; GENES; ORTHOMCL; PARALOGS; CLUSTERS; DATABASE;
D O I
10.1186/1471-2105-12-124
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases. Results: The program Proteinortho described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply Proteinortho to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes. Conclusions: Proteinortho significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.
引用
收藏
页数:9
相关论文
共 32 条
[21]   A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches [J].
Kristensen, David M. ;
Kannan, Lavanya ;
Coleman, Michael K. ;
Wolf, Yuri I. ;
Sorokin, Alexander ;
Koonin, Eugene V. ;
Mushegian, Arcady .
BIOINFORMATICS, 2010, 26 (12) :1481-1487
[22]   AN ITERATION METHOD FOR THE SOLUTION OF THE EIGENVALUE PROBLEM OF LINEAR DIFFERENTIAL AND INTEGRAL OPERATORS [J].
LANCZOS, C .
JOURNAL OF RESEARCH OF THE NATIONAL BUREAU OF STANDARDS, 1950, 45 (04) :255-282
[23]   OrthoMCL: Identification of ortholog groups for eukaryotic genomes [J].
Li, L ;
Stoeckert, CJ ;
Roos, DS .
GENOME RESEARCH, 2003, 13 (09) :2178-2189
[24]   Parallel genomic evolution and metabolic interdependence in an ancient symbiosis [J].
McCutcheon, John P. ;
Moran, Nancy A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (49) :19392-19397
[25]   The 160-kilobase genome of the bacterial endosymbiont Carsonella [J].
Nakabachi, Atsushi ;
Yamashita, Atsushi ;
Toh, Hidehiro ;
Ishikawa, Hajime ;
Dunbar, Helen E. ;
Moran, Nancy A. ;
Hattori, Masahira .
SCIENCE, 2006, 314 (5797) :267-267
[26]   SILVA:: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB [J].
Pruesse, Elmar ;
Quast, Christian ;
Knittel, Katrin ;
Fuchs, Bernhard M. ;
Ludwig, Wolfgang ;
Peplies, Joerg ;
Gloeckner, Frank Oliver .
NUCLEIC ACIDS RESEARCH, 2007, 35 (21) :7188-7196
[27]   Automatic clustering of orthologs and in-paralogs from pairwise species comparisons [J].
Remm, M ;
Storm, CEV ;
Sonnhammer, ELL .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 314 (05) :1041-1052
[28]   Genomic evidence for two functionally distinct gene classes [J].
Rivera, MC ;
Jain, R ;
Moore, JE ;
Lake, JA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6239-6244
[29]   OMA Browser - Exploring orthologous relations across 352 complete genomes [J].
Schneider, Adrian ;
Dessimoz, Christophe ;
Gonnet, Gaston H. .
BIOINFORMATICS, 2007, 23 (16) :2180-2182
[30]  
Sikdar MSI, 2010, AFR J BIOTECHNOL, V9, P1295