Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements

被引:9
作者
Chen, Kun-Tze [1 ]
Shen, Hsin-Ting [1 ]
Lu, Chin Lung [1 ]
机构
[1] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu 30013, Taiwan
关键词
Bioinformatics; Sequencing; Contig; Scaffolding; Multiple reference genomes; GENOMES; TOOL; ALGORITHM;
D O I
10.1186/s12918-018-0654-y
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
BackgroundOne of the important steps in the process of assembling a genome sequence from short reads is scaffolding, in which the contigs in a draft genome are ordered and oriented into scaffolds. Currently, several scaffolding tools based on a single reference genome have been developed. However, a single reference genome may not be sufficient alone for a scaffolder to generate correct scaffolds of a target draft genome, especially when the evolutionary relationship between the target and reference genomes is distant or some rearrangements occur between them. This motivates the need to develop scaffolding tools that can order and orient the contigs of the target genome using multiple reference genomes.ResultsIn this work, we utilize a heuristic method to develop a new scaffolder called Multi-CSAR that is able to accurately scaffold a target draft genome based on multiple reference genomes, each of which does not need to be complete. Our experimental results on real datasets show that Multi-CSAR outperforms other two multiple reference-based scaffolding tools, Ragout and MeDuSa, in terms of many average metrics, such as sensitivity, precision, F-score, genome coverage, NGA50, scaffold number and running time.ConclusionsMulti-CSAR is a multiple reference-based scaffolder that can efficiently produce more accurate scaffolds of a target draft genome by referring to multiple complete and/or incomplete genomes of related organisms. Its stand-alone program is available for download at https://github.com/ablab-nthu/Multi-CSAR.
引用
收藏
页数:7
相关论文
共 22 条
  • [1] MEDUSA: a multi-draft based scaffolder
    Bosi, Emanuele
    Donati, Beatrice
    Galardini, Marco
    Brunetti, Sara
    Sagot, Marie-France
    Lio, Pietro
    Crescenzi, Pierluigi
    Fani, Renato
    Fondi, Marco
    [J]. BIOINFORMATICS, 2015, 31 (15) : 2443 - 2451
  • [2] CSAR: a contig scaffolding tool using algebraic rearrangements
    Chen, Kun-Tze
    Liu, Chia-Liang
    Huang, Shang-Hao
    Shen, Hsin-Ting
    Shieh, Yi-Kung
    Chiu, Hsien-Tai
    Lu, Chin Lung
    [J]. BIOINFORMATICS, 2018, 34 (01) : 109 - 111
  • [3] Multi-CAR: a tool of contig scaffolding using multiple references
    Chen, Kun-Tze
    Chen, Cheih-Jung
    Shen, Hsin-Ting
    Liu, Chia-Liang
    Huang, Shang-Hao
    Lu, Chin Lung
    [J]. BMC BIOINFORMATICS, 2016, 17
  • [4] SIS: a program to generate draft genome sequence scaffolds for prokaryotes
    Dias, Zanoni
    Dias, Ulisses
    Setubal, Joao C.
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [5] QUAST: quality assessment tool for genome assemblies
    Gurevich, Alexey
    Saveliev, Vladislav
    Vyahhi, Nikolay
    Tesler, Glenn
    [J]. BIOINFORMATICS, 2013, 29 (08) : 1072 - 1075
  • [6] A comprehensive evaluation of assembly scaffolding tools
    Hunt, Martin
    Newbold, Chris
    Berriman, Matthew
    Otto, Thomas D.
    [J]. GENOME BIOLOGY, 2014, 15 (03):
  • [7] r2cat: synteny plots and comparative assembly
    Husemann, Peter
    Stoye, Jens
    [J]. BIOINFORMATICS, 2010, 26 (04) : 570 - 571
  • [8] Ragout-a reference-assisted assembly tool for bacterial genomes
    Kolmogorov, Mikhail
    Raney, Brian
    Paten, Benedict
    Son Pham
    [J]. BIOINFORMATICS, 2014, 30 (12) : 302 - 309
  • [9] Blossom V: a new implementation of a minimum cost perfect matching algorithm
    Kolmogorov, Vladimir
    [J]. MATHEMATICAL PROGRAMMING COMPUTATION, 2009, 1 (01) : 43 - 67
  • [10] Versatile and open software for comparing large genomes
    Kurtz, S
    Phillippy, A
    Delcher, AL
    Smoot, M
    Shumway, M
    Antonescu, C
    Salzberg, SL
    [J]. GENOME BIOLOGY, 2004, 5 (02)