De novo finished 2.8 Mbp Staphylococcus aureus genome assembly from 100 bp short and long range paired-end reads

被引:42
作者
Hernandez, David [1 ]
Tewhey, Ryan [2 ]
Veyrieras, Jean-Baptiste [3 ]
Farinelli, Laurent [4 ]
Osteras, Magne [4 ]
Francois, Patrice [1 ]
Schrenzel, Jacques [1 ]
机构
[1] Univ Hosp Geneva, Genom Res Lab, Infect Dis Serv, CH-1211 Geneva 4, Switzerland
[2] Scripps Res Inst, Scripps Translat Sci Inst, La Jolla, CA 92037 USA
[3] BioMerieux, Data & Knowledge Lab, F-69280 Marcy Letoile, France
[4] Fasteris SA, CH-1228 Plan Les Ouates, Switzerland
基金
瑞士国家科学基金会;
关键词
DRAFT ASSEMBLIES; SEQUENCE; GENERATION;
D O I
10.1093/bioinformatics/btt590
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Paired-end sequencing allows circumventing the shortness of the reads produced by second generation sequencers and is essential for de novo assembly of genomes. However, obtaining a finished genome from short reads is still an open challenge. We present an algorithm that exploits the pairing information issued from inserts of potentially any length. The method determines paths through an overlaps graph by using a constrained search tree. We also present a method that automatically determines suited overlaps cutoffs according to the contextual coverage, reducing thus the need for manual parameterization. Finally, we introduce an interactive mode that allows querying an assembly at targeted regions. Results: We assess our methods by assembling two Staphylococcus aureus strains that were sequenced on the Illumina platform. Using 100 bp paired-end reads and minimal manual curation, we produce a finished genome sequence for the previously undescribed isolate SGH-10-168.
引用
收藏
页码:40 / 49
页数:10
相关论文
共 26 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   A new approach to sequence comparison:: normalired sequence alignment [J].
Arslan, AN ;
Egecioglu, Ö ;
Pevzner, PA .
BIOINFORMATICS, 2001, 17 (04) :327-337
[3]   Genome and virulence determinants of high virulence community-acquired MRSA [J].
Baba, T ;
Takeuchi, F ;
Kuroda, M ;
Yuzawa, H ;
Aoki, K ;
Oguchi, A ;
Nagai, Y ;
Iwama, N ;
Asano, K ;
Naimi, T ;
Kuroda, H ;
Cui, L ;
Yamamoto, K ;
Hiramatsu, K .
LANCET, 2002, 359 (9320) :1819-1827
[4]   Toward almost closed genomes with GapFiller [J].
Boetzer, Marten ;
Pirovano, Walter .
GENOME BIOLOGY, 2012, 13 (06)
[5]   Ray: Simultaneous Assembly of Reads from a Mix of High-Throughput Sequencing Technologies [J].
Boisvert, Sebastien ;
Laviolette, Francois ;
Corbeil, Jacques .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2010, 17 (11) :1519-1533
[6]   Telescoper: de novo assembly of highly repetitive regions [J].
Bresler, Ma'ayan ;
Sheehan, Sara ;
Chan, Andrew H. ;
Song, Yun S. .
BIOINFORMATICS, 2012, 28 (18) :I311-I317
[7]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[8]   High-quality draft assemblies of mammalian genomes from massively parallel sequence data [J].
Gnerre, Sante ;
MacCallum, Iain ;
Przybylski, Dariusz ;
Ribeiro, Filipe J. ;
Burton, Joshua N. ;
Walker, Bruce J. ;
Sharpe, Ted ;
Hall, Giles ;
Shea, Terrance P. ;
Sykes, Sean ;
Berlin, Aaron M. ;
Aird, Daniel ;
Costello, Maura ;
Daza, Riza ;
Williams, Louise ;
Nicol, Robert ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Lander, Eric S. ;
Jaffe, David B. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) :1513-1518
[9]   Analyzing genomes with cumulative skew diagrams [J].
Grigoriev, A .
NUCLEIC ACIDS RESEARCH, 1998, 26 (10) :2286-2290
[10]   QUAST: quality assessment tool for genome assemblies [J].
Gurevich, Alexey ;
Saveliev, Vladislav ;
Vyahhi, Nikolay ;
Tesler, Glenn .
BIOINFORMATICS, 2013, 29 (08) :1072-1075