On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data

被引:165
|
作者
Arredondo-Alonso, Sergio [1 ]
Willems, Rob J. [1 ]
van Schaik, Willem [1 ,2 ]
Schurch, Anita C. [1 ]
机构
[1] Univ Med Ctr Utrecht, Dept Med Microbiol, Utrecht, Netherlands
[2] Univ Birmingham, Inst Microbiol & Infect, Birmingham, W Midlands, England
来源
MICROBIAL GENOMICS | 2017年 / 3卷 / 10期
关键词
plasmids; mobile genetic elements; DNA sequence analysis; bacterial genomes; replicon benchmarking; CHROMOSOME;
D O I
10.1099/mgen.0.000128
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
To benchmark algorithms for automated plasmid sequence reconstruction from short-read sequencing data, we selected 42 publicly available complete bacterial genome sequences spanning 12 genera, containing 148 plasmids. We predicted plasmids from short-read data with four programs (PlasmidSPAdes, Recycler, cBar and PlasmidFinder) and compared the outcome to the reference sequences. PlasmidSPAdes reconstructs plasmids based on coverage differences in the assembly graph. It reconstructed most of the reference plasmids (recall=0.82), but approximately a quarter of the predicted plasmid contigs were false positives (precision=0.75). PlasmidSPAdes merged 84% of the predictions from genomes with multiple plasmids into a single bin. Recycler searches the assembly graph for sub-graphs corresponding to circular sequences and correctly predicted small plasmids, but failed with long plasmids (recall=0.12, precision=0.30). cBar, which applies pentamer frequency analysis to detect plasmid-derived contigs, showed a recall and precision of 0.76 and 0.62, respectively. However, cBar categorizes contigs as plasmid-derived and does not bin the different plasmids. PlasmidFinder, which searches for replicons, had the highest precision (1.0), but was restricted by the contents of its database and the contig length obtained from de novo assembly (recall=0.36). PlasmidSPAdes and Recycler detected putative small plasmids (< 10 kbp), which were also predicted as plasmids by cBar, but were absent in the original assembly. This study shows that it is possible to automatically predict small plasmids. Prediction of large plasmids (> 50 kbp) containing repeated sequences remains challenging and limits the high-throughput analysis of plasmids from short-read whole-genome sequencing data.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Determining Streptococcus suis serotype from short-read whole-genome sequencing data
    Athey, Taryn B. T.
    Teatero, Sarah
    Lacouture, Sonia
    Takamatsu, Daisuke
    Gottschalk, Marcelo
    Fittipaldi, Nahuel
    BMC MICROBIOLOGY, 2016, 16
  • [2] Determining Streptococcus suis serotype from short-read whole-genome sequencing data
    Taryn B. T. Athey
    Sarah Teatero
    Sonia Lacouture
    Daisuke Takamatsu
    Marcelo Gottschalk
    Nahuel Fittipaldi
    BMC Microbiology, 16
  • [3] Deriving Group A Streptococcus Typing Information from Short-Read Whole-Genome Sequencing Data
    Athey, Taryn B. T.
    Teatero, Sarah
    Li, Aimin
    Marchand-Austin, Alex
    Beall, Bernard W.
    Fittipaldi, Nahuel
    JOURNAL OF CLINICAL MICROBIOLOGY, 2014, 52 (06) : 1871 - 1876
  • [4] Comparative Analysis of Structural Variant Callers on Short-Read Whole-Genome Sequencing Data
    Mkrtchyan, A. A.
    Grammatikati, K. S.
    Kazakova, P. G.
    Mitrofanov, S. I.
    Zemsky, P. U.
    Ivashechkin, A. A.
    Pilipenko, M. N.
    Svetlichny, D. V.
    Sergeev, A. P.
    Snigir, E. A.
    Frolova, L. V.
    Shpakova, T. A.
    Yudin, V. S.
    Keskinov, A. A.
    Yudin, S. M.
    Skvortsova, V. I.
    RUSSIAN JOURNAL OF GENETICS, 2023, 59 (06) : 595 - 613
  • [5] Comparative Analysis of Structural Variant Callers on Short-Read Whole-Genome Sequencing Data
    A. A. Mkrtchyan
    K. S. Grammatikati
    P. G. Kazakova
    S. I. Mitrofanov
    P. U. Zemsky
    A. A. Ivashechkin
    M. N. Pilipenko
    D. V. Svetlichny
    A. P. Sergeev
    E. A. Snigir
    L. V. Frolova
    T. A. Shpakova
    V. S. Yudin
    A. A. Keskinov
    S. M. Yudin
    V. I. Skvortsova
    Russian Journal of Genetics, 2023, 59 : 595 - 613
  • [6] Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
    Sundquist, Andreas
    Ronaghi, Mostafa
    Tang, Haixu
    Pevzner, Pavel
    Batzoglou, Serafim
    PLOS ONE, 2007, 2 (05):
  • [7] Molecular diagnostics of myotonic dystrophies from short-read whole genome sequencing data
    Lojova, Ingrid
    Kucharik, Marcel
    Pos, Zuzana
    Zatkova, Andrea
    Budis, Jaroslav
    Kadasi, Ludevit
    Szemes, Tomas
    Radvansky, Jan
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2023, 31 : 585 - 586
  • [8] Evaluating Short-Read Whole-Genome Sequencing Accuracy through Pseudo-Replication
    Herzig, A.
    Velo-Suarez, L.
    Le Folgoc, G.
    Genin, E.
    HUMAN HEREDITY, 2020, 84 (4-5) : 210 - 210
  • [9] Short-Read Whole-Genome Sequencing for Laboratory-Based Surveillance of Bordetella pertussis
    Marchand-Austin, Alex
    Tsang, Raymond S. W.
    Guthrie, Jennifer L.
    Ma, Jennifer H.
    Lim, Gillian H.
    Crowcroft, Natasha S.
    Deeks, Shelley L.
    Farrell, David J.
    Jamieson, Frances B.
    JOURNAL OF CLINICAL MICROBIOLOGY, 2017, 55 (05) : 1446 - 1453
  • [10] TETyper: a bioinformatic pipeline for classifying variation and genetic contexts of transposable elements from short-read whole-genome sequencing data
    Sheppard, Anna E.
    Stoesser, Nicole
    German-Mesner, Ian
    Vegesana, Kasi
    Walker, A. Sarah
    Crook, Derrick W.
    Mathers, Amy J.
    MICROBIAL GENOMICS, 2018, 4 (12):