EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies

被引:144
作者
Mitchell, Alex L. [1 ]
Scheremetjew, Maxim [1 ]
Denise, Hubert [1 ]
Potter, Simon [1 ]
Tarkowska, Aleksandra [1 ]
Qureshi, Matloob [1 ]
Salazar, Gustavo A. [1 ]
Pesseat, Sebastien [1 ]
Boland, Miguel A. [1 ]
Hunter, Fiona M. I. [1 ]
ten Hoopen, Petra [1 ]
Alako, Blaise [1 ]
Amid, Clara [1 ]
Wilkinson, Darren J. [2 ]
Curtis, Thomas P. [3 ]
Cochrane, Guy [1 ]
Finn, Robert D. [1 ]
机构
[1] IEMBL EBI European Bioinformat Inst, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England
[2] Newcastle Univ, Sch Math & Stat, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[3] Newcastle Univ, Sch Civil Engn & Geosci, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
基金
英国生物技术与生命科学研究理事会; “创新英国”项目;
关键词
RNA GENE DATABASE; GENOME; TOOL;
D O I
10.1093/nar/gkx967
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
EBI metagenomics (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the analysis and archiving of sequence data derived from the microbial populations found in a particular environment. Over the past two years, EBI metagenomics has increased the number of datasets analysed 10-fold. In addition to increased throughput, the underlying analysis pipeline has been overhauled to include both new or updated tools and reference databases. Of particular note is a new workflow for taxonomic assignments that has been extended to include assignments based on both the large and small subunit RNA marker genes and to encompass all cellular micro-organisms. We also describe the addition of metagenomic assembly as a new analysis service. Our pilot studies have produced over 2400 assemblies from datasets in the public domain. From these assemblies, we have produced a searchable, non-redundant protein database of over 50 million sequences. To provide improved access to the data stored within the resource, we have developed a programmatic interface that provides access to the analysis results and associated sample metadata. Finally, we have integrated the results of a series of statistical analyses that provide estimations of diversity and sample comparisons.
引用
收藏
页码:D726 / D735
页数:10
相关论文
共 44 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]   UniProt: the universal protein knowledgebase [J].
Bateman, Alex ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alpi, Emanuele ;
Antunes, Ricardo ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Cowley, Andrew ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Fazzini, Francesco ;
Castro, Leyla Garcia ;
Figueira, Luis ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzalez, Daniel ;
Hatton-Ellis, Emma ;
Li, Weizhong ;
Liu, Wudong ;
Lopez, Rodrigo ;
Luo, Jie ;
Lussi, Yvonne ;
MacDougall, Alistair ;
Nightingale, Andrew ;
Palka, Barbara ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Speretta, Elena ;
Turner, Edward ;
Tyagi, Nidhi ;
Volynkin, Vladimir ;
Wardell, Tony ;
Warner, Kate ;
Watkins, Xavier ;
Zaru, Rossana ;
Zellner, Hermann ;
Xenarios, Ioannis .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D158-D169
[3]   Trimmomatic: a flexible trimmer for Illumina sequence data [J].
Bolger, Anthony M. ;
Lohse, Marc ;
Usadel, Bjoern .
BIOINFORMATICS, 2014, 30 (15) :2114-2120
[4]   New CRISPR-Cas systems from uncultivated microbes [J].
Burstein, David ;
Harrington, Lucas B. ;
Strutt, Steven C. ;
Probst, Alexander J. ;
Anantharaman, Karthik ;
Thomas, Brian C. ;
Doudna, Jennifer A. ;
Banfield, Jillian F. .
NATURE, 2017, 542 (7640) :237-241
[5]   Improved protein-ligand binding affinity prediction by using a curvature-dependent surface-area model [J].
Cao, Yang ;
Li, Lei .
BIOINFORMATICS, 2014, 30 (12) :1674-1680
[6]   QIIME allows analysis of high-throughput community sequencing data [J].
Caporaso, J. Gregory ;
Kuczynski, Justin ;
Stombaugh, Jesse ;
Bittinger, Kyle ;
Bushman, Frederic D. ;
Costello, Elizabeth K. ;
Fierer, Noah ;
Pena, Antonio Gonzalez ;
Goodrich, Julia K. ;
Gordon, Jeffrey I. ;
Huttley, Gavin A. ;
Kelley, Scott T. ;
Knights, Dan ;
Koenig, Jeremy E. ;
Ley, Ruth E. ;
Lozupone, Catherine A. ;
McDonald, Daniel ;
Muegge, Brian D. ;
Pirrung, Meg ;
Reeder, Jens ;
Sevinsky, Joel R. ;
Tumbaugh, Peter J. ;
Walters, William A. ;
Widmann, Jeremy ;
Yatsunenko, Tanya ;
Zaneveld, Jesse ;
Knight, Rob .
NATURE METHODS, 2010, 7 (05) :335-336
[7]   IMG/M: integrated genome and metagenome comparative data analysis system [J].
Chen, I-Min A. ;
Markowitz, Victor M. ;
Chu, Ken ;
Palaniappan, Krishna ;
Szeto, Ernest ;
Pillay, Manoj ;
Ratner, Anna ;
Huang, Jinghua ;
Andersen, Evan ;
Huntemann, Marcel ;
Varghese, Neha ;
Hadjithomas, Michalis ;
Tennessen, Kristin ;
Nielsen, Torben ;
Ivanova, Natalia N. ;
Kyrpides, Nikos C. .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D507-D516
[8]   An expansion of rare lineage intestinal microbes characterizes rheumatoid arthritis [J].
Chen, Jun ;
Wright, Kerry ;
Davis, John M. ;
Jeraldo, Patricio ;
Marietta, Eric V. ;
Murray, Joseph ;
Nelson, Heidi ;
Matteson, Eric L. ;
Taneja, Veena .
GENOME MEDICINE, 2016, 8
[9]   Space-efficient and exact de Bruijn graph representation based on a Bloom filter [J].
Chikhi, Rayan ;
Rizk, Guillaume .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2013, 8
[10]   The International Nucleotide Sequence Database Collaboration [J].
Cochrane, Guy ;
Karsch-Mizrachi, Ilene ;
Takagi, Toshihisa .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D48-D50