IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth

被引:2185
作者
Peng, Yu [1 ]
Leung, Henry C. M. [1 ]
Yiu, S. M. [1 ]
Chin, Francis Y. L. [1 ]
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
关键词
BACTERIAL GENOMES; READS;
D O I
10.1093/bioinformatics/bts174
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. However, both technologies suffer from the problem that sequencing depth of different regions of a genome or genomes from different species are highly uneven. Most existing genome assemblers usually have an assumption that sequencing depths are even. These assemblers fail to construct correct long contigs. Results: We introduce the IDBA-UD algorithm that is based on the de Bruijn graph approach for assembling reads from single-cell sequencing or metagenomic sequencing technologies with uneven sequencing depths. Several non-trivial techniques have been employed to tackle the problems. Instead of using a simple threshold, we use multiple depthrelative thresholds to remove erroneous k-mers in both low-depth and high-depth regions. The technique of local assembly with paired-end information is used to solve the branch problem of low-depth short repeat regions. To speed up the process, an error correction step is conducted to correct reads of high-depth regions that can be aligned to highconfident contigs. Comparison of the performances of IDBA-UD and existing assemblers (Velvet, Velvet-SC, SOAPdenovo and Meta-IDBA) for different datasets, shows that IDBA-UD can reconstruct longer contigs with higher accuracy.
引用
收藏
页码:1420 / 1428
页数:9
相关论文
共 20 条
  • [1] ALLPATHS: De novo assembly of whole-genome shotgun microreads
    Butler, Jonathan
    MacCallum, Iain
    Kleber, Michael
    Shlyakhter, Ilya A.
    Belmonte, Matthew K.
    Lander, Eric S.
    Nusbaum, Chad
    Jaffe, David B.
    [J]. GENOME RESEARCH, 2008, 18 (05) : 810 - 820
  • [2] Short read fragment assembly of bacterial genomes
    Chaisson, Mark J.
    Pevzner, Pavel A.
    [J]. GENOME RESEARCH, 2008, 18 (02) : 324 - 330
  • [3] De novo fragment assembly with short mate-paired reads: Does the read length matter?
    Chaisson, Mark J.
    Brinza, Dumitru
    Pevzner, Pavel A.
    [J]. GENOME RESEARCH, 2009, 19 (02) : 336 - 346
  • [4] Efficient de novo assembly of single-cell bacterial genomes from short-read data sets
    Chitsaz, Hamidreza
    Yee-Greenbaum, Joyclyn L.
    Tesler, Glenn
    Lombardo, Mary-Jane
    Dupont, Christopher L.
    Badger, Jonathan H.
    Novotny, Mark
    Rusch, Douglas B.
    Fraser, Louise J.
    Gormley, Niall A.
    Schulz-Trieglaff, Ole
    Smith, Geoffrey P.
    Evers, Dirk J.
    Pevzner, Pavel A.
    Lasken, Roger S.
    [J]. NATURE BIOTECHNOLOGY, 2011, 29 (10) : 915 - U214
  • [5] Assemblathon 1: A competitive assessment of de novo short read assembly methods
    Earl, Dent
    Bradnam, Keith
    St John, John
    Darling, Aaron
    Lin, Dawei
    Fass, Joseph
    Hung On Ken Yu
    Buffalo, Vince
    Zerbino, Daniel R.
    Diekhans, Mark
    Ngan Nguyen
    Ariyaratne, Pramila Nuwantha
    Sung, Wing-Kin
    Ning, Zemin
    Haimel, Matthias
    Simpson, Jared T.
    Fonseca, Nuno A.
    Birol, Inanc
    Docking, T. Roderick
    Ho, Isaac Y.
    Rokhsar, Daniel S.
    Chikhi, Rayan
    Lavenier, Dominique
    Chapuis, Guillaume
    Naquin, Delphine
    Maillet, Nicolas
    Schatz, Michael C.
    Kelley, David R.
    Phillippy, Adam M.
    Koren, Sergey
    Yang, Shiaw-Pyng
    Wu, Wei
    Chou, Wen-Chi
    Srivastava, Anuj
    Shaw, Timothy I.
    Ruby, J. Graham
    Skewes-Cox, Peter
    Betegon, Miguel
    Dimon, Michelle T.
    Solovyev, Victor
    Seledtsov, Igor
    Kosarev, Petr
    Vorobyev, Denis
    Ramirez-Gonzalez, Ricardo
    Leggett, Richard
    MacLean, Dan
    Xia, Fangfang
    Luo, Ruibang
    Li, Zhenyu
    Xie, Yinlong
    [J]. GENOME RESEARCH, 2011, 21 (12) : 2224 - 2241
  • [6] De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer
    Hernandez, David
    Francois, Patrice
    Farinelli, Laurent
    Osteras, Magne
    Schrenzel, Jacques
    [J]. GENOME RESEARCH, 2008, 18 (05) : 802 - 809
  • [7] Quake: quality-aware detection and correction of sequencing errors
    Kelley, David R.
    Schatz, Michael C.
    Salzberg, Steven L.
    [J]. GENOME BIOLOGY, 2010, 11 (11):
  • [8] Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]
  • [9] De novo assembly of human genomes with massively parallel short read sequencing
    Li, Ruiqiang
    Zhu, Hongmei
    Ruan, Jue
    Qian, Wubin
    Fang, Xiaodong
    Shi, Zhongbin
    Li, Yingrui
    Li, Shengting
    Shan, Gao
    Kristiansen, Karsten
    Li, Songgang
    Yang, Huanming
    Wang, Jian
    Wang, Jun
    [J]. GENOME RESEARCH, 2010, 20 (02) : 265 - 272
  • [10] Error correction of high-throughput sequencing datasets with non-uniform coverage
    Medvedev, Paul
    Scott, Eric
    Kakaradov, Boyko
    Pevzner, Pavel
    [J]. BIOINFORMATICS, 2011, 27 (13) : I137 - I141