Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

被引:450
作者
Bradnam, Keith R. [1 ]
Fass, Joseph N. [1 ]
Alexandrov, Anton [36 ]
Baranay, Paul [2 ]
Bechner, Michael [39 ]
Birol, Inanc [33 ]
Boisvert, Sebastien [10 ]
Chapman, Jarrod A. [19 ]
Chapuis, Guillaume [7 ,9 ]
Chikhi, Rayan [7 ,9 ]
Chitsaz, Hamidreza [6 ]
Chou, Wen-Chi [13 ,15 ]
Corbeil, Jacques [12 ]
Del Fabbro, Cristian [16 ]
Docking, T. Roderick [33 ]
Durbin, Richard [34 ]
Earl, Dent [40 ]
Emrich, Scott [3 ]
Fedotov, Pavel [36 ]
Fonseca, Nuno A. [29 ,35 ]
Ganapathy, Ganeshkumar [38 ]
Gibbs, Richard A. [31 ,32 ]
Gnerre, Sante [21 ]
Godzaridis, Elenie [10 ]
Goldstein, Steve [39 ]
Haimel, Matthias [29 ]
Hall, Giles [21 ]
Haussler, David [40 ]
Hiatt, Joseph B. [41 ]
Ho, Isaac Y. [19 ]
Howard, Jason [38 ]
Hunt, Martin [34 ]
Jackman, Shaun D. [33 ]
Jaffe, David B. [21 ]
Jarvis, Erich D. [38 ]
Jiang, Huaiyang [31 ,32 ]
Kazakov, Sergey [36 ]
Kersey, Paul J. [29 ]
Kitzman, Jacob O. [41 ]
Knight, James R. [37 ]
Koren, Sergey [23 ,24 ]
Lam, Tak-Wah [28 ]
Lavenier, Dominique [7 ,8 ,9 ]
Laviolette, Francois [11 ]
Li, Yingrui [27 ,28 ]
Li, Zhenyu [27 ]
Liu, Binghang [27 ]
Liu, Yue [31 ,32 ]
Luo, Ruibang [27 ,28 ]
MacCallum, Iain [21 ]
机构
[1] Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA
[2] Yale Univ, New Haven, CT USA
[3] Univ Notre Dame, Dept Comp Sci & Engn, South Bend, IN 46556 USA
[4] Cold Spring Harbor Lab, Simons Ctr Quantitat Biol, Cold Spring Harbor, NY 11724 USA
[5] Univ Calif Berkeley, Berkeley Calif Inst Quantitat Biosci, Berkeley, CA 94720 USA
[6] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA
[7] IRISA, ENS Cachan, Computer Sci Dept, F-35042 Rennes, France
[8] INRIA, Rennes Bretagne Atlant, F-35042 Rennes, France
[9] IRISA, CNRS, F-35042 Rennes, France
[10] Univ Laval, Fac Med, Quebec City, PQ G1V 4G2, Canada
[11] Univ Laval, Fac Sci & Engn, Dept Comp Sci & Software Engn, Quebec City, PQ, Canada
[12] Univ Laval, Fac Med, Dept Mol Med, Quebec City, PQ G1V 4G2, Canada
[13] Univ Georgia, Inst Bioinformat, Athens, GA 30602 USA
[14] Univ Georgia, Coll Publ Hlth, Dept Epidemiol & Biostat, Athens, GA 30602 USA
[15] Hebrew SeniorLife, Inst Aging Res, Boston, MA 02131 USA
[16] IGA, I-33100 Udine, Italy
[17] Univ Udine, Dept Math & Comp Sci, I-33100 Udine, Italy
[18] KTH Royal Inst Technol, Sci Life Lab, S-17121 Solna, Sweden
[19] DOE Joint Genome Inst, Walnut Creek, CA USA
[20] Univ Calif Berkeley, Dept Mol & Cell Biol, Berkeley, CA 94720 USA
[21] Broad Inst, Cambridge, MA USA
[22] New York Genome Ctr, New York, NY 10022 USA
[23] Nat Biodefense Anal & Countermeasures Ctr, Frederick, MD 21702 USA
[24] Univ Maryland, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
[25] Univ Calif San Francisco, Dept Biochem & Biophys, San Francisco, CA 94143 USA
[26] Howard Hughes Med Inst, Bethesda, MD 20814 USA
[27] BGI Shenzhen, Shenzhen 518083, Guangdong, Peoples R China
[28] Univ Hong Kong, HKU BGI Bioinformat Algorithms & Core Technol Res, Hong Kong, Hong Kong, Peoples R China
[29] EMBL European Bioinformat Inst, Cambridge CB10 1SD, England
[30] Univ Lisbon, Computat Biol & Populat Genom Grp, Ctr Environm Biol, Dept Anim Biol,Fac Sci, P-1749016 Lisbon, Portugal
[31] Baylor Coll Med, Human Genome Sequencing Ctr, Houston, TX 77030 USA
[32] Baylor Coll Med, Dept Mol & Human Genet, Houston, TX 77030 USA
[33] British Columbia Canc Agcy, Genome Sci Ctr, Vancouver, BC V5Z 4E6, Canada
[34] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[35] CRACS INESC TEC, P-4200465 Oporto, Portugal
[36] Univ ITMO, Nat Res Univ Informat Technol Mech & Opt, St Petersburg 197101, Russia
[37] Life Sci, Branford, CT 06405 USA
[38] Duke Univ, Med Ctr, Durham, NC 27710 USA
[39] UW Biotechnol Ctr, Dept Chem & Genet, Lab Mol & Computat Gen, Madison, WI USA
[40] Univ Calif Santa Cruz, Howard Hughes Med Inst, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
[41] Univ Washington, Sch Med, Dept Gen Sci, Seattle, WA 98195 USA
来源
GIGASCIENCE | 2013年 / 2卷
基金
美国国家科学基金会;
关键词
Assessment; COMPASS; Genome assembly; Heterozygosity; N50; Scaffolds;
D O I
10.1186/2047-217X-2-10
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
引用
收藏
页数:31
相关论文
共 69 条
[1]  
Bentley D.R., Whole-genome re-sequencing, Curr Opin Genet Dev, 16, pp. 545-552, (2006)
[2]  
Haussler D., O'Brien S.J., Ryder O.A., Barker F.K., Clamp M., Crawford A.J., Hanner R., Hanotte O., Johnson W.E., McGuire J.A., Genome 10K: a proposal to obtain whole-genome sequence for 10 000 vertebrate species, J Hered, 100, pp. 659-674, (2009)
[3]  
Kumar S., Schiffer P.H., Blaxter M., 959 Nematode Genomes: a semantic wiki for coordinating sequencing projects, Nucleic Acids Res, 40, pp. D1295-D1300, (2012)
[4]  
Pevzner P.A., Tang H., Waterman M.S., An Eulerian path approach to DNA fragment assembly, Proc Natl Acad Sci, 98, pp. 9748-9753, (2001)
[5]  
Butler J., MacCallum I., Kleber M., Shlyakhter I.A., Belmonte M.K., Lander E.S., Nusbaum C., Jaffe D.B., ALLPATHS: De novo assembly of whole-genome shotgun microreads, Genome Res, 18, pp. 810-820, (2008)
[6]  
Zerbino D.R., Birney E., Velvet: algorithms for de novo short read assembly using de Bruijn graphs, Genome Res, 18, pp. 821-829, (2008)
[7]  
Simpson J.T., Wong K., Jackman S.D., Schein J.E., Jones S.J.M., Birol I., ABySS: a parallel assembler for short read sequence data, Genome Res, 19, pp. 1117-1123, (2009)
[8]  
Luo R., Liu B., Xie Y., Li Z., Huang W., Yuan J., He G., Chen Y., Pan Q., Liu Y., SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler, GigaScience, 1, (2012)
[9]  
Li R., Fan W., Tian G., Zhu H., He L., Cai J., Huang Q., Cai Q., Li B., Bai Y., Zhang Z., Zhang Y., Wang W., Li J., Wei F., Li H., Jian M., Li J., Zhang Z., Nielsen R., Li D., Gu W., Yang Z., Xuan Z., Ryder O.A., Leung F.C.-C., Zhou Y., Cao J., Sun X., Fu Y., Et al., The sequence and de novo assembly of the giant panda genome, Nature, 463, pp. 311-317, (2010)
[10]  
Simpson J.T., Durbin R., Efficient de novo assembly of large genomes using compressed data structures, Genome Res, 22, 3, pp. 549-556, (2011)