PlasClass improves plasmid sequence classification

被引:60
作者
Pellow, David [1 ]
Mizrahi, Itzik [2 ,3 ]
Shamir, Ron [1 ]
机构
[1] Tel Aviv Univ, Blavatnik Sch Comp Sci, Tel Aviv, Israel
[2] Ben Gurion Univ Negev, Dept Life Sci, Beer Sheva, Israel
[3] Natl Inst Biotechnol Negev, Marcus Family Campus, Beer Sheva, Israel
基金
欧洲研究理事会; 美国国家科学基金会; 以色列科学基金会;
关键词
Compilation and indexing terms; Copyright 2024 Elsevier Inc;
D O I
10.1371/journal.pcbi.1007781
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Many bacteria contain plasmids, but separating between contigs that originate on the plasmid and those that are part of the bacterial genome can be difficult. This is especially true in metagenomic assembly, which yields many contigs of unknown origin. Existing tools for classifying sequences of plasmid origin give less reliable results for shorter sequences, are trained using a fraction of the known plasmids, and can be difficult to use in practice. We present PlasClass, a new plasmid classifier. It uses a set of standard classifiers trained on the most current set of known plasmid sequences for different sequence lengths. We tested PlasClass sequence classification on held-out data and simulations, as well as publicly available bacterial isolates and plasmidome samples and plasmids assembled from metagenomic samples. PlasClass outperforms the state-of-the-art plasmid classification tool on shorter sequences, which constitute the majority of assembly contigs, allowing it to achieve higher F1 scores in classifying sequences from a wide range of datasets. PlasClass also uses significantly less time and memory. PlasClass can be used to easily classify plasmid and bacterial genome sequences in metagenomic or isolate assemblies. It is available under the MIT license from: https://github.com/Shamir-Lab/PlasClass..
引用
收藏
页数:9
相关论文
共 9 条
  • [1] On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data
    Arredondo-Alonso, Sergio
    Willems, Rob J.
    van Schaik, Willem
    Schurch, Anita C.
    [J]. MICROBIAL GENOMICS, 2017, 3 (10):
  • [2] SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
    Bankevich, Anton
    Nurk, Sergey
    Antipov, Dmitry
    Gurevich, Alexey A.
    Dvorkin, Mikhail
    Kulikov, Alexander S.
    Lesin, Valery M.
    Nikolenko, Sergey I.
    Son Pham
    Prjibelski, Andrey D.
    Pyshkin, Alexey V.
    Sirotkin, Alexander V.
    Vyahhi, Nikolay
    Tesler, Glenn
    Alekseyev, Max A.
    Pevzner, Pavel A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) : 455 - 477
  • [3] Simulating Illumina metagenomic data with InSilicoSeq
    Gourle, Hadrien
    Karlsson-Lindsjo, Oskar
    Hayer, Juliette
    Bongcam-Rudloff, Erik
    [J]. BIOINFORMATICS, 2019, 35 (03) : 521 - 522
  • [4] PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures
    Krawczyk, Pawel S.
    Lipinski, Leszek
    Dziembowski, Andrzej
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (06)
  • [5] Pedregosa F, 2011, J MACH LEARN RES, V12, P2825
  • [6] Recycler: an algorithm for detecting plasmids from de novo assembly graphs
    Rozov, Roye
    Kav, Aya Brown
    Bogumil, David
    Shterzer, Naama
    Halperin, Eran
    Mizrahi, Itzhak
    Shamir, Ron
    [J]. BIOINFORMATICS, 2017, 33 (04) : 475 - 482
  • [7] Characteristics of ARG-carrying plasmidome in the cultivable microbial community from wastewater treatment system under high oxytetracycline concentration
    Shi, Yanhong
    Zhang, Hong
    Tian, Zhe
    Yang, Min
    Zhang, Yu
    [J]. APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, 2018, 102 (04) : 1847 - 1858
  • [8] The UniProt C, 2018, NUCLEIC ACIDS RES, V47, pD506
  • [9] cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data
    Zhou, Fengfeng
    Xu, Ying
    [J]. BIOINFORMATICS, 2010, 26 (16) : 2051 - 2052