MEDUSA: a multi-draft based scaffolder

被引:295
作者
Bosi, Emanuele [1 ,2 ]
Donati, Beatrice [3 ,4 ,5 ]
Galardini, Marco [6 ]
Brunetti, Sara [7 ]
Sagot, Marie-France [3 ,4 ,8 ]
Lio, Pietro [9 ]
Crescenzi, Pierluigi [5 ]
Fani, Renato [1 ,2 ]
Fondi, Marco [1 ,2 ]
机构
[1] Univ Florence, Florence Computat Biol Grp, ComBo, Dept Biol, I-50019 Sesto Fiorentino, Italy
[2] Univ Florence, LEMM, Lab Microbial & Mol Evolut Florence, Dept Biol, I-50019 Sesto Fiorentino, Italy
[3] INRIA Rhone Alpes, Villeurbanne, France
[4] Univ Lyon, F-69000 Lyon, France
[5] Univ Florence, Dipartimento Ingn Informaz, I-50139 Florence, Italy
[6] EMBL EBI, Cambridge CB10 1SD, England
[7] Univ Siena, Dipartimento Ingn Informaz & Sci Matemat, I-53100 Siena, Italy
[8] Univ Lyon 1, CNRS, UMR5558, F-69622 Villeurbanne, France
[9] Univ Cambridge, Comp Lab, Cambridge CB3 0FD, England
基金
欧洲研究理事会;
关键词
ALGORITHM; SOFTWARE; GENOMES; READS;
D O I
10.1093/bioinformatics/btv171
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Completing the genome sequence of an organism is an important task in comparative, functional and structural genomics. However, this remains a challenging issue from both a computational and an experimental viewpoint. Genome scaffolding (i.e. the process of ordering and orientating contigs) of de novo assemblies usually represents the first step in most genome finishing pipelines. Results: In this article we present MEDUSA (Multi-Draft based Scaffolder), an algorithm for genome scaffolding. MEDUSA exploits information obtained from a set of (draft or closed) genomes from related organisms to determine the correct order and orientation of the contigs. MEDUSA formalizes the scaffolding problem by means of a combinatorial optimization formulation on graphs and implements an efficient constant factor approximation algorithm to solve it. In contrast to currently used scaffolders, it does not require either prior knowledge on the microrganisms dataset under analysis (e.g. their phylogenetic relationships) or the availability of paired end read libraries. This makes usability and running time two additional important features of our method. Moreover, benchmarks and tests on real bacterial datasets showed that MEDUSA is highly accurate and, in most cases, outperforms traditional scaffolders. The possibility to use MEDUSA on eukaryotic datasets has also been evaluated, leading to interesting results.
引用
收藏
页码:2443 / 2451
页数:9
相关论文
共 25 条
  • [1] Scaffolder - software for manual genome scaffolding
    Barton, Michael D.
    Barton, Hazel A.
    [J]. SOURCE CODE FOR BIOLOGY AND MEDICINE, 2012, 7 (01):
  • [2] Scaffolding pre-assembled contigs using SSPACE
    Boetzer, Marten
    Henkel, Christiaan V.
    Jansen, Hans J.
    Butler, Derek
    Pirovano, Walter
    [J]. BIOINFORMATICS, 2011, 27 (04) : 578 - 579
  • [3] SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data
    Cox, Murray P.
    Peterson, Daniel A.
    Biggs, Patrick J.
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [4] Darling A.E., 2010, PLOS ONE, V5, P1754
  • [5] SOPRA: Scaffolding algorithm for paired reads via statistical optimization
    Dayarian, Adel
    Michael, Todd P.
    Sengupta, Anirvan M.
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [6] SCARPA: scaffolding reads with practical algorithms
    Donmez, Nilgun
    Brudno, Michael
    [J]. BIOINFORMATICS, 2013, 29 (04) : 428 - 434
  • [7] Draft genomes of three Antarctic Psychrobacter strains producing antimicrobial compounds against Burkholderia cepacia complex, opportunistic human pathogens
    Fondi, Marco
    Orlandini, Valerio
    Perrin, Elena
    Maida, Isabel
    Bosi, Emanuele
    Papaleo, Maria Cristiana
    Michaud, Luigi
    Lo Giudice, Angelina
    de Pascale, Donatella
    Tutino, Maria Luisa
    Lio, Pietro
    Fani, Renato
    [J]. MARINE GENOMICS, 2014, 13 : 37 - 38
  • [8] CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes
    Galardini, Marco
    Biondi, Emanuele G.
    Bazzicalupo, Marco
    Mengoni, Alessio
    [J]. SOURCE CODE FOR BIOLOGY AND MEDICINE, 2011, 6 (01):
  • [9] Opera: Reconstructing Optimal Genomic Scaffolds with High-Throughput Paired-End Sequences
    Gao, Song
    Sung, Wing-Kin
    Nagarajan, Niranjan
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (11) : 1681 - 1691
  • [10] GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies
    Gritsenko, Alexey A.
    Nijkamp, Jurgen F.
    Reinders, Marcel J. T.
    de Ridder, Dick
    [J]. BIOINFORMATICS, 2012, 28 (11) : 1429 - 1437