On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data

被引:165
|
作者
Arredondo-Alonso, Sergio [1 ]
Willems, Rob J. [1 ]
van Schaik, Willem [1 ,2 ]
Schurch, Anita C. [1 ]
机构
[1] Univ Med Ctr Utrecht, Dept Med Microbiol, Utrecht, Netherlands
[2] Univ Birmingham, Inst Microbiol & Infect, Birmingham, W Midlands, England
来源
MICROBIAL GENOMICS | 2017年 / 3卷 / 10期
关键词
plasmids; mobile genetic elements; DNA sequence analysis; bacterial genomes; replicon benchmarking; CHROMOSOME;
D O I
10.1099/mgen.0.000128
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
To benchmark algorithms for automated plasmid sequence reconstruction from short-read sequencing data, we selected 42 publicly available complete bacterial genome sequences spanning 12 genera, containing 148 plasmids. We predicted plasmids from short-read data with four programs (PlasmidSPAdes, Recycler, cBar and PlasmidFinder) and compared the outcome to the reference sequences. PlasmidSPAdes reconstructs plasmids based on coverage differences in the assembly graph. It reconstructed most of the reference plasmids (recall=0.82), but approximately a quarter of the predicted plasmid contigs were false positives (precision=0.75). PlasmidSPAdes merged 84% of the predictions from genomes with multiple plasmids into a single bin. Recycler searches the assembly graph for sub-graphs corresponding to circular sequences and correctly predicted small plasmids, but failed with long plasmids (recall=0.12, precision=0.30). cBar, which applies pentamer frequency analysis to detect plasmid-derived contigs, showed a recall and precision of 0.76 and 0.62, respectively. However, cBar categorizes contigs as plasmid-derived and does not bin the different plasmids. PlasmidFinder, which searches for replicons, had the highest precision (1.0), but was restricted by the contents of its database and the contig length obtained from de novo assembly (recall=0.36). PlasmidSPAdes and Recycler detected putative small plasmids (< 10 kbp), which were also predicted as plasmids by cBar, but were absent in the original assembly. This study shows that it is possible to automatically predict small plasmids. Prediction of large plasmids (> 50 kbp) containing repeated sequences remains challenging and limits the high-throughput analysis of plasmids from short-read whole-genome sequencing data.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Deciphering complex genome rearrangements in C. elegans using short-read whole genome sequencing
    Maroilley, Tatiana
    Li, Xiao
    Oldach, Matthew
    Jean, Francesca
    Stasiuk, Susan J.
    Tarailo-Graovac, Maja
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [22] Whole Animal Genome Sequencing: user-friendly, rapid, containerized pipelines for processing, variant discovery, and annotation of short-read whole genome sequencing data
    Cullen, Jonah N.
    Friedenberg, Steven G.
    G3-GENES GENOMES GENETICS, 2023, 13 (08):
  • [23] Impact of short-read sequencing on the misassembly of a plant genome
    Wang, Peipei
    Meng, Fanrui
    Moore, Bethany M.
    Shiu, Shin-Han
    BMC GENOMICS, 2021, 22 (01)
  • [24] Deciphering complex genome rearrangements in C. elegans using short-read whole genome sequencing
    Tatiana Maroilley
    Xiao Li
    Matthew Oldach
    Francesca Jean
    Susan J. Stasiuk
    Maja Tarailo-Graovac
    Scientific Reports, 11
  • [25] Performance and Accuracy of Four Open-Source Tools for In Silico Serotyping of Salmonella spp. Based on Whole-Genome Short-Read Sequencing Data
    Uelze, Laura
    Borowiak, Maria
    Deneke, Carlus
    Szabo, Istvan
    Fischer, Jennie
    Tausch, Simon H.
    Malorny, Burkhard
    APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2020, 86 (05) : 1 - 14
  • [26] Whole-Genome Sequencing: The Long and the Short of It
    Caspar, Sylvan
    Stoll, Patricia
    Fritzmann, Siro
    Gut, Gilles
    Salerno, Daniel
    Meienberg, Janine
    Matyas, Gabor
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2023, 31 : 618 - 618
  • [27] HMMploidy: inference of ploidy levels from short-read sequencing data
    Soraggi, Samuele
    Rhodes, Johanna
    Altinkaya, Isin
    Tarrant, Oliver
    Balloux, Francois
    Fisher, Matthew C.
    Fumagalli, Matteo
    PEER COMMUNITY JOURNAL, 2022, 2
  • [28] Igcaller: Reconstructing the Rearranged Immunoglobulin Gene in Lymphoid Neoplasms from Whole-Genome Sequencing Data
    Nadeu, Ferran
    Mas-de-les-Valls, Rut
    Royo, Romina
    Navarro, Alba
    Bea, Silvia
    Lu, Junyan
    Rivas-Delgado, Alfredo
    Villamor, Neus
    Martin, Silvia
    Aymerich, Marta
    Baumann, Tycho
    Delgado, Julio
    Colomer, Dolors
    Puente, Xose S.
    Zenz, Thorsten
    Campbell, Peter J.
    Maura, Francesco
    Campo, Elias
    BLOOD, 2019, 134
  • [29] Blindspots in short-read genome sequencing for classic chromosomal rearrangements
    Gauthier, Lucas
    Caillot, Claire
    Pujalte, Mathilde
    Till, Marianne
    Sanlaville, Damien
    Chatron, Nicolas
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 1572 - 1572
  • [30] Impact of wet-lab protocols on quality of whole-genome short-read sequences from foodborne microbial pathogens
    Forth, Leonie F.
    Brinks, Erik
    Denay, Gregoire
    Fawzy, Ahmad
    Fiedler, Stefan
    Fuchs, Jannika
    Geuthner, Anne-Catrin
    Hankeln, Thomas
    Hiller, Ekkehard
    Murr, Larissa
    Petersen, Henning
    Reiting, Ralf
    Schaefers, Christian
    Schwab, Claudia
    Szabo, Kathrin
    Thuermer, Andrea
    Woehlke, Anne
    Fischer, Jennie
    Lueth, Stefanie
    Projahn, Michaela
    Stingl, Kerstin
    Borowiak, Maria
    Deneke, Carlus
    Malorny, Burkhard
    Uelze, Laura
    FRONTIERS IN MICROBIOLOGY, 2023, 14