NCycDB: a curated integrative database for fast and accurate metagenomic profiling of nitrogen cycling genes

被引:241
作者
Tu, Qichao [1 ]
Lin, Lu [1 ]
Cheng, Lei [2 ]
Deng, Ye [3 ,4 ]
He, Zhili [5 ,6 ]
机构
[1] Shandong Univ, Inst Marine Sci & Technol, Qingdao, Shandong, Peoples R China
[2] Zhejiang Univ, Coll Life Sci, Dept Ecol, Hangzhou, Zhejiang, Peoples R China
[3] Chinese Acad Sci, Res Ctr Ecoenvironm Sci, Beijing, Peoples R China
[4] Univ Chinese Acad Sci, Coll Resources & Environm, Beijing, Peoples R China
[5] Sun Yat Sen Univ, Sch Environm Sci & Engn, Dept Environm Sci, Guangzhou, Guangdong, Peoples R China
[6] Hunan Agr Univ, Coll Agr, Dept Agr, Changsha, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
COMMUNITIES; DIVERSITY; PATTERNS; ARCHAEA; PERSPECTIVE; ANNOTATION; EVOLUTION; ABUNDANCE; GRADIENT; BACTERIA;
D O I
10.1093/bioinformatics/bty741
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation The nitrogen (N) cycle is a collection of important biogeochemical pathways in the Earth ecosystem and has gained extensive foci in ecology and environmental studies. Currently, shotgun metagenome sequencing has been widely applied to explore gene families responsible for N cycle processes. However, there are problems in applying publically available orthology databases to profile N cycle gene families in shotgun metagenomes, such as inefficient database searching, unspecific orthology groups and low coverage of N cycle genes and/or gene (sub)families. Results To solve these issues, this study built a manually curated integrative database (NCycDB) for fast and accurate profiling of N cycle gene (sub)families from shotgun metagenome sequencing data. NCycDB contains a total of 68 gene (sub)families and covers eight N cycle processes with 84 759 and 219 146 representative sequences at 95 and 100% identity cutoffs, respectively. We also identified 1958 homologous orthology groups and included corresponding sequences in the database to avoid false positive assignments due to small database' issues. We applied NCycDB to characterize N cycle gene (sub)families in 52 shotgun metagenomes from the Global Ocean Sampling expedition. Further analysis showed that the structure and composition of N cycle gene families were most strongly correlated with latitude and temperature. NCycDB is expected to facilitate N cycle studies via shotgun metagenome sequencing approaches in various environments. The framework developed in this study can be served as a good reference to build similar knowledge-based functional gene databases in various processes and pathways. Availability and implementation NCycDB database files are available at https://github.com/qichao1984/NCyc. Supplementary information Supplementary data are available at Bioinformatics online.
引用
收藏
页码:1040 / 1048
页数:9
相关论文
共 49 条
[1]  
Brown JH, 2004, ECOLOGY, V85, P1771, DOI 10.1890/03-9000
[2]   Global biogeography of SAR11 marine bacteria [J].
Brown, Mark V. ;
Lauro, Federico M. ;
DeMaere, Matthew Z. ;
Muir, Les ;
Wilkins, David ;
Thomas, Torsten ;
Riddle, Martin J. ;
Fuhrman, Jed A. ;
Andrews-Pfannkoch, Cynthia ;
Hoffman, Jeffrey M. ;
McQuaid, Jeffrey B. ;
Allen, Andrew ;
Rintoul, Stephen R. ;
Cavicchioli, Ricardo .
MOLECULAR SYSTEMS BIOLOGY, 2012, 8
[3]   Determinants of the distribution of nitrogen-cycling microbial communities at the landscape scale [J].
Bru, D. ;
Ramette, A. ;
Saby, N. P. A. ;
Dequiedt, S. ;
Ranjard, L. ;
Jolivet, C. ;
Arrouays, D. ;
Philippot, L. .
ISME JOURNAL, 2011, 5 (03) :532-542
[4]   Fast and sensitive protein alignment using DIAMOND [J].
Buchfink, Benjamin ;
Xie, Chao ;
Huson, Daniel H. .
NATURE METHODS, 2015, 12 (01) :59-60
[5]   The Evolution and Future of Earth's Nitrogen Cycle [J].
Canfield, Donald E. ;
Glazer, Alexander N. ;
Falkowski, Paul G. .
SCIENCE, 2010, 330 (6001) :192-196
[6]   nifH pyrosequencing reveals the potential for location-specific soil chemistry to influence N2-fixing community dynamics [J].
Collavino, Monica M. ;
Tripp, H. James ;
Frank, Ildiko E. ;
Vidoz, Maria L. ;
Calderoli, Priscila A. ;
Donato, Mariano ;
Zehr, Jonathan P. ;
Mario Aguilar, O. .
ENVIRONMENTAL MICROBIOLOGY, 2014, 16 (10) :3211-3223
[7]   Nitrification driven by bacteria and not archaea in nitrogen-rich grassland soils [J].
Di, H. J. ;
Cameron, K. C. ;
Shen, J. P. ;
Winefield, C. S. ;
O'Callaghan, M. ;
Bowatte, S. ;
He, J. Z. .
NATURE GEOSCIENCE, 2009, 2 (09) :621-624
[8]   Search and clustering orders of magnitude faster than BLAST [J].
Edgar, Robert C. .
BIOINFORMATICS, 2010, 26 (19) :2460-2461
[9]   Evolution of the nitrogen cycle and its influence on the biological sequestration of CO2 in the ocean [J].
Falkowski, PG .
NATURE, 1997, 387 (6630) :272-275
[10]   The Pfam protein families database: towards a more sustainable future [J].
Finn, Robert D. ;
Coggill, Penelope ;
Eberhardt, Ruth Y. ;
Eddy, Sean R. ;
Mistry, Jaina ;
Mitchell, Alex L. ;
Potter, Simon C. ;
Punta, Marco ;
Qureshi, Matloob ;
Sangrador-Vegas, Amaia ;
Salazar, Gustavo A. ;
Tate, John ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D279-D285