Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data

被引:16
作者
Finotello, Francesca [2 ]
Lavezzo, Enrico [1 ]
Fontana, Paolo [3 ]
Peruzzo, Denis [2 ]
Albiero, Alessandro
Barzon, Luisa [1 ]
Falda, Marco [4 ]
Di Camillo, Barbara [2 ]
Toppo, Stefano [4 ]
机构
[1] Univ Padua, Dept Histol Microbiol & Med Biotechnol, Padua, Italy
[2] Univ Padua, Dept Informat Engn, Padua, Italy
[3] Edmund Mach Fdn San Michele allAdige, Bioinformat Grp, Trento, Italy
[4] Univ Padua, Sch Med, Dept Biol Chem, Padua, Italy
关键词
assembly algorithm assessment; bacterial genome; 454; pyrosequencing; coverage; SEQUENCING TECHNOLOGY; PYROPHOSPHATE; REPEATS; PROGRAM; READS;
D O I
10.1093/bib/bbr063
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Next-generation sequencing technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. In this context, an important issue is the need of a careful assessment of the accuracy of the assembly process. Here, we review the efficiency of a panel of assemblers, specifically designed to handle data from GS FLX 454 platform, on three bacterial data sets with different characteristics in terms of reads coverage and repeats content. Our aim is to investigate their strengths and weaknesses in the reconstruction of the reference genomes. In our benchmarking, we assess assemblers' performance, quantifying and characterizing assembly gaps and errors, and evaluating their ability to solve complex genomic regions containing repeats. The final goal of this analysis is to highlight pros and cons of each method, in order to provide the final user with general criteria for the right choice of the appropriate assembly strategy, depending on the specific needs. A further aspect we have explored is the relationship between coverage of a sequencing project and quality of the obtained results. The final outcome suggests that, for a good tradeoff between costs and results, the planned genome coverage of an experiment should not exceed 20-30 x.
引用
收藏
页码:269 / 280
页数:12
相关论文
共 39 条
[1]  
Alkan C., 2010, Nat Methods, V8, P61, DOI DOI 10.1038/NMETH.1527
[2]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[3]   Read Length and Repeat Resolution: Exploring Prokaryote Genomes Using Next-Generation Sequencing Technologies [J].
Cahill, Matt J. ;
Koser, Claudio U. ;
Ross, Nicholas E. ;
Archer, John A. C. .
PLOS ONE, 2010, 5 (07)
[4]  
Chevreux B., 1999, Proceedings of the German Conference on Bioinformatics (GCB), V99, P45
[5]   Next-generation sequencing in aging research: Emerging applications, problems, pitfalls and possible solutions [J].
de Magalhaes, Joao Pedro ;
Finch, Caleb E. ;
Janssens, Georges .
AGEING RESEARCH REVIEWS, 2010, 9 (03) :315-323
[6]  
Finotello FPD, 2010, BITS 2010 ANN M BIOI, P157
[7]   Application of a superword array in genome assembly [J].
Huang, XQ ;
Yang, SP ;
Chinwalla, AT ;
Hillier, LW ;
Minx, P ;
Mardis, ER ;
Wilson, RK .
NUCLEIC ACIDS RESEARCH, 2006, 34 (01) :201-205
[8]   PCAP: A whole-genome assembly program [J].
Huang, XQ ;
Wang, JM ;
Aluru, S ;
Yang, SP ;
Hillier, L .
GENOME RESEARCH, 2003, 13 (09) :2164-2170
[9]   Assembling genomes using short-read sequencing technology [J].
Jackman, Shaun D. ;
Birol, Inanc .
GENOME BIOLOGY, 2010, 11 (01)
[10]   Complete Genome Sequence of Lactobacillus fermentum CECT 5716, a Probiotic Strain Isolated from Human Milk [J].
Jimenez, Esther ;
Langa, Susana ;
Martin, Virginia ;
Arroyo, Rebeca ;
Martin, Rocio ;
Fernandez, Leonides ;
Rodriguez, Juan M. .
JOURNAL OF BACTERIOLOGY, 2010, 192 (18) :4800-4800