Remote homology and the functions of metagenomic dark matter

被引:27
作者
Lobb, Briallen [1 ]
Kurtz, Daniel A. [1 ]
Moreno-Hagelsieb, Gabriel [2 ]
Doxey, Andrew C. [1 ]
机构
[1] Univ Waterloo, Dept Biol, Waterloo, ON N2L 3G1, Canada
[2] Wilfrid Laurier Univ, Dept Biol, Waterloo, ON N2L 3C5, Canada
关键词
ESCHERICHIA-COLI; PROTEIN; ORFANS; ORIGIN; GENES; EVOLUTION; INSIGHTS; ORPHANS; FAMILY; VIEW;
D O I
10.3389/fgene.2015.00234
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain insights into these questions, we applied sensitive remote-homology detection methods to functionally analyze ORFans from soil, marine, and human gut metagenome collections. ORFans were identified, clustered into sequence families, and annotated through profile-profile comparison to proteins of known structure. We found that a considerable number of metagenomic ORFans (73,896 of 484,121, 15.3%) exhibit significant remote homology to structurally characterized proteins, providing a means for ORFan functional profiling. The extent of detected remote homology far exceeds that obtained for artificial protein families (1.4%). As expected for real genes, the predicted functions of ORFans are significantly similar to the functions of their gene neighbors (p < 0.001). Compared to the functional profiles predicted through standard homology searches, ORFans show biologically intriguing differences. Many ORFan-enriched functions are virus-related and tend to reflect biological processes associated with extreme sequence diversity. Each environment also possesses a large number of unique ORFan families and functions, including some known to play important community roles such as gut microbial polysaccharide digestion. Lastly, ORFans are a valuable resource for finding novel enzymes of interest, as we demonstrate through the identification of hundreds of novel ORFan metalloproteases that all possess a signature catalytic motif despite a general lack of similarity to known proteins. Our ORFan functional predictions are a valuable resource for discovering novel protein families and exploring the boundaries of protein sequence space. All remote homology predictions are available at http://doxey.uwaterloo.ca/ORFans.
引用
收藏
页数:12
相关论文
共 68 条
[1]   The Thermolysin Family (M4) of Enzymes: Therapeutic and Biotechnological Potential [J].
Adekoya, Olayiwola A. ;
Sylte, Ingebrigt .
CHEMICAL BIOLOGY & DRUG DESIGN, 2009, 73 (01) :7-16
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Pseudogenes, junk DNA, and the dynamics of Rickettsia genomes [J].
Andersson, JO ;
Andersson, SGE .
MOLECULAR BIOLOGY AND EVOLUTION, 2001, 18 (05) :829-839
[4]  
[Anonymous], 2007, PLOS BIOL, DOI DOI 10.1371/journal.pbio.0050016
[5]   Horizontal Gene Transfer Contributed to the Evolution of Extracellular Surface Structures: The Freshwater Polyp Hydra Is Covered by a Complex Fibrous Cuticle Containing Glycosaminoglycans and Proteins of the PPOD and SWT (Sweet Tooth) Families [J].
Boettger, Angelika ;
Doxey, Andrew C. ;
Hess, Michael W. ;
Pfaller, Kristian ;
Salvenmoser, Willi ;
Deutzmann, Rainer ;
Geissner, Andreas ;
Pauly, Barbara ;
Altstaetter, Johannes ;
Muender, Sandra ;
Heim, Astrid ;
Gabius, Hans-Joachim ;
McConkey, Brendan J. ;
David, Charles N. .
PLOS ONE, 2012, 7 (12)
[6]   The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics [J].
Cantarel, Brandi L. ;
Coutinho, Pedro M. ;
Rancurel, Corinne ;
Bernard, Thomas ;
Lombard, Vincent ;
Henrissat, Bernard .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D233-D238
[7]   A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes [J].
Cortez, Diego ;
Forterre, Patrick ;
Gribaldo, Simonetta .
GENOME BIOLOGY, 2009, 10 (06)
[8]   Conservation of gene order: a fingerprint of proteins that physically interact [J].
Dandekar, T ;
Snel, B ;
Huynen, M ;
Bork, P .
TRENDS IN BIOCHEMICAL SCIENCES, 1998, 23 (09) :324-328
[9]   Bacterial Genomes as new gene homes:: The genealogy of ORFans in E-coli [J].
Daubin, V ;
Ochman, H .
GENOME RESEARCH, 2004, 14 (06) :1036-1042
[10]   Bacteriophage T7 DNA ligase - Overexpression, purification, crystallization, and characterization [J].
Doherty, AJ ;
Ashford, SR ;
Subramanya, HS ;
Wigley, DB .
JOURNAL OF BIOLOGICAL CHEMISTRY, 1996, 271 (19) :11083-11089