Next-generation sequencing (NGS) is a popular method for assessing the molecular diversity of microbial communities without cultivation, for identifying polymorphisms in populations, and for comparing genomes and transcriptomes. However, sequence-specific errors (SSEs) by NGS systems can result in genome mis-assembly, overestimation of diversity in microbial community analyses, and false polymorphism discovery. SSEs can be particularly problematic due to rich microbial biodiversity and genomes containing frequent repeats. In this study, SSEs in public data from all popular NGS systems were discovered using a Markov chain model and hotspots for sequence errors were identified. Deletion errors were frequently preceded by homopolymers in non-Illumina NGS systems, such as GS FLX+. Substitution errors were often related to high GC contents and long G/C homopolymers in Illumina sequencing systems such as HiSeq. After removal of long G/C homopolymers in HiSeq, the average lengths of contigs and average SNP quality increased. SSEs were selectively removed from our mock community data by quality filtering, and a bias against specific microbes was identified. Our findings provide a scientific basis for filtering poor-quality reads, correcting deletion errors, preventing genome mis-assembly, and accurately assessing microbial community compositions and polymorphisms.
机构:
Ohio State Univ, Coll Med, Steve & Cindy Rasmussen Inst Genom Med, Columbus, OH 43210 USA
Ohio State Univ, Dept Pediat, Coll Med, Columbus, OH 43210 USAOhio State Univ, Coll Med, Steve & Cindy Rasmussen Inst Genom Med, Columbus, OH 43210 USA
Lee, Kristy
Abraham, Roshini S.
论文数: 0引用数: 0
h-index: 0
机构:
Ohio State Univ, Nationwide Childrens Hosp, Dept Pathol & Lab Med, Diagnost Immunol Lab,Coll Med, Columbus, OH 43210 USA
Ohio State Univ, Dept Pathol, Coll Med, Columbus, OH 43210 USAOhio State Univ, Coll Med, Steve & Cindy Rasmussen Inst Genom Med, Columbus, OH 43210 USA
机构:
Broad Inst, Cambridge, MA 02142 USABroad Inst, Cambridge, MA 02142 USA
Li, Heng
Homer, Nils
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90024 USABroad Inst, Cambridge, MA 02142 USA
机构:
Paracelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, Austria
Tech Univ Munich, Inst Humangenet, Munich, GermanyParacelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, Austria
Wortmann, Saskia B.
Spenger, Johannes
论文数: 0引用数: 0
h-index: 0
机构:
Paracelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, AustriaParacelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, Austria
Spenger, Johannes
Preisel, Martin
论文数: 0引用数: 0
h-index: 0
机构:
Paracelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, AustriaParacelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, Austria
Preisel, Martin
Koch, Johannes
论文数: 0引用数: 0
h-index: 0
机构:
Paracelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, AustriaParacelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, Austria
Koch, Johannes
Rauscher, Christian
论文数: 0引用数: 0
h-index: 0
机构:
Paracelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, AustriaParacelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, Austria
Rauscher, Christian
Bader, Ingrid
论文数: 0引用数: 0
h-index: 0
机构:
Paracelsus Med Privatuniv, Div Klin Genet, Univ Klin Kinder & Jugendheilkunde, Salzburg, AustriaParacelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, Austria
Bader, Ingrid
Mayr, Johannes A.
论文数: 0引用数: 0
h-index: 0
机构:
Paracelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, AustriaParacelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, Austria
Mayr, Johannes A.
Sperl, Wolfgang
论文数: 0引用数: 0
h-index: 0
机构:
Paracelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, AustriaParacelsus Med Privatuniv, Univ Klin Kinder & Jugendheilkunde, Salzburg, Austria
Sperl, Wolfgang
PADIATRIE UND PADOLOGIE,
2018,
53
(06):
: 278
-
283