Multi-CAR: a tool of contig scaffolding using multiple references

被引:9
作者
Chen, Kun-Tze [1 ]
Chen, Cheih-Jung [1 ]
Shen, Hsin-Ting [1 ]
Liu, Chia-Liang [1 ]
Huang, Shang-Hao [1 ]
Lu, Chin Lung [1 ]
机构
[1] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu 30013, Taiwan
来源
BMC BIOINFORMATICS | 2016年 / 17卷
关键词
Bioinformatics; Next-generation sequencing; Contigs; Scaffolding; Multiple references; GENOMES; ASSEMBLIES; ALGORITHM;
D O I
10.1186/s12859-016-1328-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: A draft genome assembled by current next-generation sequencing techniques from short reads is just a collection of contigs, whose relative positions and orientations along the genome being sequenced are unknown. To further obtain its complete sequence, a contig scaffolding process is usually applied to order and orient the contigs in the draft genome. Although several single reference-based scaffolding tools have been proposed, they may produce erroneous scaffolds if there are rearrangements between the target and reference genomes or their phylogenetic relationship is distant. This may suggest that a single reference genome may not be sufficient to produce correct scaffolds of a draft genome. Results: In this study, we design a simple heuristic method to further revise our single reference-based scaffolding tool CAR into a new one called Multi-CAR such that it can utilize multiple complete genomes of related organisms as references to more accurately order and orient the contigs of a draft genome. In practical usage, our Multi-CAR does not require prior knowledge concerning phylogenetic relationships among the draft and reference genomes and libraries of paired-end reads. To validate Multi-CAR, we have tested it on a real dataset composed of several prokaryotic genomes and also compared its accuracy performance with other multiple reference-based scaffolding tools Ragout and MeDuSa. Our experimental results have finally shown that Multi-CAR indeed outperforms Ragout and MeDuSa in terms of sensitivity, precision, genome coverage, scaffold number and scaffold N50 size. Conclusions: Multi-CAR serves as an efficient tool that can more accurately order and orient the contigs of a draft genome based on multiple reference genomes. The web server of Multi-CAR is freely available at http://genome.cs.nthu.edu.tw/Multi-CAR/.
引用
收藏
页数:8
相关论文
共 20 条
  • [1] ABACAS: algorithm-based automatic contiguation of assembled sequences
    Assefa, Samuel
    Keane, Thomas M.
    Otto, Thomas D.
    Newbold, Chris
    Berriman, Matthew
    [J]. BIOINFORMATICS, 2009, 25 (15) : 1968 - 1969
  • [2] Whole-genome re-sequencing
    Bentley, David R.
    [J]. CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) : 545 - 552
  • [3] Scaffolding pre-assembled contigs using SSPACE
    Boetzer, Marten
    Henkel, Christiaan V.
    Jansen, Hans J.
    Butler, Derek
    Pirovano, Walter
    [J]. BIOINFORMATICS, 2011, 27 (04) : 578 - 579
  • [4] MEDUSA: a multi-draft based scaffolder
    Bosi, Emanuele
    Donati, Beatrice
    Galardini, Marco
    Brunetti, Sara
    Sagot, Marie-France
    Lio, Pietro
    Crescenzi, Pierluigi
    Fani, Renato
    Fondi, Marco
    [J]. BIOINFORMATICS, 2015, 31 (15) : 2443 - 2451
  • [5] SOPRA: Scaffolding algorithm for paired reads via statistical optimization
    Dayarian, Adel
    Michael, Todd P.
    Sengupta, Anirvan M.
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [6] SIS: a program to generate draft genome sequence scaffolds for prokaryotes
    Dias, Zanoni
    Dias, Ulisses
    Setubal, Joao C.
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [7] r2cat: synteny plots and comparative assembly
    Husemann, Peter
    Stoye, Jens
    [J]. BIOINFORMATICS, 2010, 26 (04) : 570 - 571
  • [8] Ragout-a reference-assisted assembly tool for bacterial genomes
    Kolmogorov, Mikhail
    Raney, Brian
    Paten, Benedict
    Son Pham
    [J]. BIOINFORMATICS, 2014, 30 (12) : 302 - 309
  • [9] Blossom V: a new implementation of a minimum cost perfect matching algorithm
    Kolmogorov, Vladimir
    [J]. MATHEMATICAL PROGRAMMING COMPUTATION, 2009, 1 (01) : 43 - 67
  • [10] Versatile and open software for comparing large genomes
    Kurtz, S
    Phillippy, A
    Delcher, AL
    Smoot, M
    Shumway, M
    Antonescu, C
    Salzberg, SL
    [J]. GENOME BIOLOGY, 2004, 5 (02)