MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis

被引:87
作者
Meyer, Folker [1 ,2 ,3 ,4 ]
Bagchi, Saurabh [5 ]
Chaterji, Somali [6 ]
Gerlach, Wolfgang [1 ,7 ]
Grama, Ananth [8 ,9 ]
Harrison, Travis [1 ,7 ]
Paczian, Tobias [1 ,7 ]
Trimble, William L. [1 ]
Wilke, Andreas [1 ,7 ]
机构
[1] Argonne Natl Lab, Argonne, IL 60439 USA
[2] Univ Chicago, Dept Med, Chicago, IL 60637 USA
[3] Univ Chicago, Computat Inst, Chicago, IL 60637 USA
[4] Argonne Natl Lab, Div Biol, Argonne, IL 60439 USA
[5] Purdue Univ, Sch Elect & Comp Engn, Dept Comp Sci, W Lafayette, IN 47907 USA
[6] Purdue Univ, Res Fac, W Lafayette, IN 47907 USA
[7] Univ Chicago, Chicago, IL 60637 USA
[8] Purdue Univ, Comp Sci, W Lafayette, IN 47907 USA
[9] Natl Sci Fdn, Ctr Sci Informat, Sci & Technol Ctr, Alexandria, VA USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
metagenome analysis; cloud; distributed workflows; ALIGNMENT; INFORMATION; ALGORITHM; ACCURACY; DATABASE;
D O I
10.1093/bib/bbx105
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1-3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community's data analysis tasks.
引用
收藏
页码:1151 / 1159
页数:9
相关论文
共 56 条
[1]   The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update [J].
Afgan, Enis ;
Baker, Dannon ;
van den Beek, Marius ;
Blankenberg, Daniel ;
Bouvier, Dave ;
Cech, Martin ;
Chilton, John ;
Clements, Dave ;
Coraor, Nate ;
Eberhard, Carl ;
Gruening, Bjoern ;
Guerler, Aysam ;
Hillman-Jackson, Jennifer ;
Von Kuster, Greg ;
Rasche, Eric ;
Soranzo, Nicola ;
Turaga, Nitesh ;
Taylor, James ;
Nekrutenko, Anton ;
Goecks, Jeremy .
NUCLEIC ACIDS RESEARCH, 2016, 44 (W1) :W3-W10
[2]  
Alexandre R, 2013, INSTANT APACHE SOLR
[3]  
Alneberg J, 2014, NAT METHODS, V11, P1144, DOI [10.1038/NMETH.3103, 10.1038/nmeth.3103]
[4]  
[Anonymous], DNA SEQUENCING COSTS
[5]  
[Anonymous], 2 IEEE ACM INT S BIG
[6]  
Aronesty E., 2013, The Open Bioinformatics Journal, V7, P1, DOI [DOI 10.2174/1875036201307010001, 10.2174/1875036201307010001]
[7]   Why linked data is not enough for scientists [J].
Bechhofer, Sean ;
Buchan, Iain ;
De Roure, David ;
Missier, Paolo ;
Ainsworth, John ;
Bhagat, Jiten ;
Couch, Philip ;
Cruickshank, Don ;
Delderfield, Mark ;
Dunlop, Ian ;
Gamble, Matthew ;
Michaelides, Danius ;
Owen, Stuart ;
Newman, David ;
Sufi, Shoaib ;
Goble, Carole .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (02) :599-611
[8]  
Bent J, 2009, CHECKPOINT FILESYSTE
[9]   Metazen - metadata capture for metagenomes [J].
Bischof, Jared ;
Harrison, Travis ;
Paczian, Tobias ;
Glass, Elizabeth ;
Wilke, Andreas ;
Meyer, Folker .
STANDARDS IN GENOMIC SCIENCES, 2014, 9 (01)
[10]   Fast and sensitive protein alignment using DIAMOND [J].
Buchfink, Benjamin ;
Xie, Chao ;
Huson, Daniel H. .
NATURE METHODS, 2015, 12 (01) :59-60