Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data

被引:64
作者
Desai, Aarti [1 ]
Marwah, Veer Singh [1 ]
Yadav, Akshay [1 ]
Jha, Vineet [1 ]
Dhaygude, Kishor [1 ]
Bangar, Ujwala [1 ]
Kulkarni, Vivek [1 ]
Jere, Abhay [1 ]
机构
[1] Persistent Syst Ltd, Persistent LABS, Pune, Maharashtra, India
关键词
SHORT DNA-SEQUENCES; ALGORITHMS; DISCOVERY; GENETICS;
D O I
10.1371/journal.pone.0060204
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E. coli (4.6 MB) S. kudriavzevii (11.18 MB) and C. elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
引用
收藏
页数:11
相关论文
共 43 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]   Ray: Simultaneous Assembly of Reads from a Mix of High-Throughput Sequencing Technologies [J].
Boisvert, Sebastien ;
Laviolette, Francois ;
Corbeil, Jacques .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2010, 17 (11) :1519-1533
[3]   Exome Capture Reveals ZNF423 and CEP164 Mutations, Linking Renal Ciliopathies to DNA Damage Response Signaling [J].
Chaki, Moumita ;
Airik, Rannar ;
Ghosh, Amiya K. ;
Giles, Rachel H. ;
Chen, Rui ;
Slaats, Gisela G. ;
Wang, Hui ;
Hurd, Toby W. ;
Zhou, Weibin ;
Cluckey, Andrew ;
Gee, Heon Yung ;
Ramaswami, Gokul ;
Hong, Chen-Jei ;
Hamilton, Bruce A. ;
Cervenka, Igor ;
Ganji, Ranjani Sri ;
Bryja, Vitezslav ;
Arts, Heleen H. ;
van Reeuwijk, Jeroen ;
Oud, Machteld M. ;
Letteboer, Stef J. F. ;
Roepman, Ronald ;
Husson, Herve ;
Ibraghimov-Beskrovnaya, Oxana ;
Yasunaga, Takayuki ;
Walz, Gerd ;
Eley, Lorraine ;
Sayer, John A. ;
Schermer, Bernhard ;
Liebau, Max C. ;
Benzing, Thomas ;
Le Corre, Stephanie ;
Drummond, Iain ;
Janssen, Sabine ;
Allen, Susan J. ;
Natarajan, Sivakumar ;
O'Toole, John F. ;
Attanasio, Massimo ;
Saunier, Sophie ;
Antignac, Corinne ;
Koenekoop, Robert K. ;
Ren, Huanan ;
Lopez, Irma ;
Nayir, Ahmet ;
Stoetzel, Corinne ;
Dollfus, Helene ;
Massoudi, Rustin ;
Gleeson, Joseph G. ;
Andreoli, Sharon P. ;
Doherty, Dan G. .
CELL, 2012, 150 (03) :533-548
[4]   Meraculous: De Novo Genome Assembly with Short Paired-End Reads [J].
Chapman, Jarrod A. ;
Ho, Isaac ;
Sunkara, Sirisha ;
Luo, Shujun ;
Schroth, Gary P. ;
Rokhsar, Daniel S. .
PLOS ONE, 2011, 6 (08)
[5]   Uncovering the Complexity of Transcriptomes with RNA-Seq [J].
Costa, Valerio ;
Angelini, Claudia ;
De Feis, Italia ;
Ciccodicola, Alfredo .
JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2010,
[6]   De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data [J].
DiGuistini, Scott ;
Liao, Nancy Y. ;
Platt, Darren ;
Robertson, Gordon ;
Seidel, Michael ;
Chan, Simon K. ;
Docking, T. Roderick ;
Birol, Inanc ;
Holt, Robert A. ;
Hirst, Martin ;
Mardis, Elaine ;
Marra, Marco A. ;
Hamelin, Richard C. ;
Bohlmann, Joerg ;
Breuil, Colette ;
Jones, Steven J. M. .
GENOME BIOLOGY, 2009, 10 (09)
[7]   Advanced Methylome Analysis after Bisulfite Deep Sequencing: An Example in Arabidopsis [J].
Dinh, Huy Q. ;
Dubin, Manu ;
Sedlazeck, Fritz J. ;
Lettner, Nicole ;
Scheid, Ortrun Mittelsten ;
von Haeseler, Arndt .
PLOS ONE, 2012, 7 (07)
[8]   De Novo Assembly of Chickpea Transcriptome Using Short Reads for Gene Discovery and Marker Identification [J].
Garg, Rohini ;
Patel, Ravi K. ;
Tyagi, Akhilesh K. ;
Jain, Mukesh .
DNA RESEARCH, 2011, 18 (01) :53-63
[9]   High-quality draft assemblies of mammalian genomes from massively parallel sequence data [J].
Gnerre, Sante ;
MacCallum, Iain ;
Przybylski, Dariusz ;
Ribeiro, Filipe J. ;
Burton, Joshua N. ;
Walker, Bruce J. ;
Sharpe, Ted ;
Hall, Giles ;
Shea, Terrance P. ;
Sykes, Sean ;
Berlin, Aaron M. ;
Aird, Daniel ;
Costello, Maura ;
Daza, Riza ;
Williams, Louise ;
Nicol, Robert ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Lander, Eric S. ;
Jaffe, David B. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) :1513-1518
[10]   Evaluation of Methods for De Novo Genome Assembly from High-Throughput Sequencing Reads Reveals Dependencies That Affect the Quality of the Results [J].
Haiminen, Niina ;
Kuhn, David N. ;
Parida, Laxmi ;
Rigoutsos, Isidore .
PLOS ONE, 2011, 6 (09)