Redundans: an assembly pipeline for highly heterozygous genomes

被引:371
作者
Pryszcz, Leszek P. [1 ,2 ]
Gabaldon, Toni [1 ,3 ,4 ]
机构
[1] Barcelona Inst Sci & Technol, Ctr Genom Regulat CRG, Dr Aiguader 88, Barcelona 08003, Spain
[2] Int Inst Mol & Cell Biol, Warsaw, Poland
[3] Univ Pompeu Fabra, Barcelona 08003, Spain
[4] Inst Catalana Recerca & Estudis Avancats, Pg Lluis Co 23, Barcelona 08010, Spain
关键词
DE-NOVO ASSEMBLER; SEQUENCE DATA; ALIGNMENT; REPEATS;
D O I
10.1093/nar/gkw294
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Many genomes display high levels of heterozygosity (i.e. presence of different alleles at the same loci in homologous chromosomes), being those of hybrid organisms an extreme such case. The assembly of highly heterozygous genomes from short sequencing reads is a challenging task because it is difficult to accurately recover the different haplotypes. When confronted with highly heterozygous genomes, the standard assembly process tends to collapse homozygous regions and reports heterozygous regions in alternative contigs. The boundaries between homozygous and heterozygous regions result in multiple assembly paths that are hard to resolve, which leads to highly fragmented assemblies with a total size larger than expected. This, in turn, causes numerous problems in downstream analyses such as fragmented gene models, wrong gene copy number, or broken synteny. To circumvent these caveats we have developed a pipeline that specifically deals with the assembly of heterozygous genomes by introducing a step to recognise and selectively remove alternative heterozygous contigs. We tested our pipeline on simulated and naturally-occurring heterozygous genomes and compared its accuracy to other existing tools. Our method is freely available at https://github.com/Gabaldonlab/redundans.
引用
收藏
页数:10
相关论文
共 25 条
[1]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477
[2]   Rapid Mechanisms for Generating Genome Diversity: Whole Ploidy Shifts, Aneuploidy, and Loss of Heterozygosity [J].
Bennett, Richard J. ;
Forche, Anja ;
Berman, Judith .
COLD SPRING HARBOR PERSPECTIVES IN MEDICINE, 2014, 4 (10)
[3]   Scaffolding pre-assembled contigs using SSPACE [J].
Boetzer, Marten ;
Henkel, Christiaan V. ;
Jansen, Hans J. ;
Butler, Derek ;
Pirovano, Walter .
BIOINFORMATICS, 2011, 27 (04) :578-579
[4]   Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species [J].
Bradnam, Keith R. ;
Fass, Joseph N. ;
Alexandrov, Anton ;
Baranay, Paul ;
Bechner, Michael ;
Birol, Inanc ;
Boisvert, Sebastien ;
Chapman, Jarrod A. ;
Chapuis, Guillaume ;
Chikhi, Rayan ;
Chitsaz, Hamidreza ;
Chou, Wen-Chi ;
Corbeil, Jacques ;
Del Fabbro, Cristian ;
Docking, T. Roderick ;
Durbin, Richard ;
Earl, Dent ;
Emrich, Scott ;
Fedotov, Pavel ;
Fonseca, Nuno A. ;
Ganapathy, Ganeshkumar ;
Gibbs, Richard A. ;
Gnerre, Sante ;
Godzaridis, Elenie ;
Goldstein, Steve ;
Haimel, Matthias ;
Hall, Giles ;
Haussler, David ;
Hiatt, Joseph B. ;
Ho, Isaac Y. ;
Howard, Jason ;
Hunt, Martin ;
Jackman, Shaun D. ;
Jaffe, David B. ;
Jarvis, Erich D. ;
Jiang, Huaiyang ;
Kazakov, Sergey ;
Kersey, Paul J. ;
Kitzman, Jacob O. ;
Knight, James R. ;
Koren, Sergey ;
Lam, Tak-Wah ;
Lavenier, Dominique ;
Laviolette, Francois ;
Li, Yingrui ;
Li, Zhenyu ;
Liu, Binghang ;
Liu, Yue ;
Luo, Ruibang ;
MacCallum, Iain .
GIGASCIENCE, 2013, 2
[5]   Parameters for accurate genome alignment [J].
Frith, Martin C. ;
Hamada, Michiaki ;
Horton, Paul .
BMC BIOINFORMATICS, 2010, 11
[6]   High-quality draft assemblies of mammalian genomes from massively parallel sequence data [J].
Gnerre, Sante ;
MacCallum, Iain ;
Przybylski, Dariusz ;
Ribeiro, Filipe J. ;
Burton, Joshua N. ;
Walker, Bruce J. ;
Sharpe, Ted ;
Hall, Giles ;
Shea, Terrance P. ;
Sykes, Sean ;
Berlin, Aaron M. ;
Aird, Daniel ;
Costello, Maura ;
Daza, Riza ;
Williams, Louise ;
Nicol, Robert ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Lander, Eric S. ;
Jaffe, David B. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) :1513-1518
[7]   HaploMerger: Reconstructing allelic relationships for polymorphic diploid genome assemblies [J].
Huang, Shengfeng ;
Chen, Zelin ;
Huang, Guangrui ;
Yu, Ting ;
Yang, Ping ;
Li, Jie ;
Fu, Yonggui ;
Yuan, Shaochun ;
Chen, Shangwu ;
Xu, Anlong .
GENOME RESEARCH, 2012, 22 (08) :1581-1588
[8]   Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads [J].
Kajitani, Rei ;
Toshimoto, Kouta ;
Noguchi, Hideki ;
Toyoda, Atsushi ;
Ogura, Yoshitoshi ;
Okuno, Miki ;
Yabana, Mitsuru ;
Harada, Masayuki ;
Nagayasu, Eiji ;
Maruyama, Haruhiko ;
Kohara, Yuji ;
Fujiyama, Asao ;
Hayashi, Tetsuya ;
Itoh, Takehiko .
GENOME RESEARCH, 2014, 24 (08) :1384-1395
[9]  
Kent WJ, 2002, GENOME RES, V12, P656, DOI 10.1101/gr.229202. Article published online before March 2002
[10]   Versatile and open software for comparing large genomes [J].
Kurtz, S ;
Phillippy, A ;
Delcher, AL ;
Smoot, M ;
Shumway, M ;
Antonescu, C ;
Salzberg, SL .
GENOME BIOLOGY, 2004, 5 (02)