Characterization of sequence-specific errors in various next-generation sequencing systems

被引:28
|
作者
Shin, Sunguk [1 ]
Park, Joonhong [1 ]
机构
[1] Yonsei Univ, Dept Civil & Environm Engn, Yonsei Ro 50, Seoul 120749, South Korea
基金
新加坡国家研究基金会;
关键词
DNA; DIVERSITY;
D O I
10.1039/c5mb00750j
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Next-generation sequencing (NGS) is a popular method for assessing the molecular diversity of microbial communities without cultivation, for identifying polymorphisms in populations, and for comparing genomes and transcriptomes. However, sequence-specific errors (SSEs) by NGS systems can result in genome mis-assembly, overestimation of diversity in microbial community analyses, and false polymorphism discovery. SSEs can be particularly problematic due to rich microbial biodiversity and genomes containing frequent repeats. In this study, SSEs in public data from all popular NGS systems were discovered using a Markov chain model and hotspots for sequence errors were identified. Deletion errors were frequently preceded by homopolymers in non-Illumina NGS systems, such as GS FLX+. Substitution errors were often related to high GC contents and long G/C homopolymers in Illumina sequencing systems such as HiSeq. After removal of long G/C homopolymers in HiSeq, the average lengths of contigs and average SNP quality increased. SSEs were selectively removed from our mock community data by quality filtering, and a bias against specific microbes was identified. Our findings provide a scientific basis for filtering poor-quality reads, correcting deletion errors, preventing genome mis-assembly, and accurately assessing microbial community compositions and polymorphisms.
引用
收藏
页码:914 / 922
页数:9
相关论文
共 50 条
  • [21] Minimizing Next-Generation Sequencing Errors for HIV Drug Resistance Testing
    Fernandez-Caballero, Jose A.
    Chueca, Natalia
    Poveda, Eva
    Garcia, Federico
    AIDS REVIEWS, 2017, 19 (04) : 231 - 238
  • [22] SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data
    Davis, Eric M.
    Sun, Yu
    Liu, Yanling
    Kolekar, Pandurang
    Shao, Ying
    Szlachta, Karol
    Mulder, Heather L.
    Ren, Dongren
    Rice, Stephen V.
    Wang, Zhaoming
    Nakitandwe, Joy
    Gout, Alexander M.
    Shaner, Bridget
    Hall, Salina
    Robison, Leslie L.
    Pounds, Stanley
    Klco, Jeffery M.
    Easton, John
    Ma, Xiaotu
    GENOME BIOLOGY, 2021, 22 (01)
  • [23] SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data
    Eric M. Davis
    Yu Sun
    Yanling Liu
    Pandurang Kolekar
    Ying Shao
    Karol Szlachta
    Heather L. Mulder
    Dongren Ren
    Stephen V. Rice
    Zhaoming Wang
    Joy Nakitandwe
    Alexander M. Gout
    Bridget Shaner
    Salina Hall
    Leslie L. Robison
    Stanley Pounds
    Jeffery M. Klco
    John Easton
    Xiaotu Ma
    Genome Biology, 22
  • [24] Detection and characterization of novel sequence insertions using paired-end next-generation sequencing
    Hajirasouliha, Iman
    Hormozdiari, Fereydoun
    Alkan, Can
    Kidd, Jeffrey M.
    Birol, Inanc
    Eichler, Evan E.
    Sahinalp, S. Cenk
    BIOINFORMATICS, 2010, 26 (10) : 1277 - 1283
  • [25] Developing context-specific next-generation sequencing policy
    Margaret Ann Curnutte
    Karen L Frumovitz
    Juli M Bollinger
    Robert M Cook-Deegan
    Amy L McGuire
    Mary A Majumder
    Nature Biotechnology, 2016, 34 : 466 - 470
  • [26] HUMAN DISEASE Next-generation sequencing of the next generation
    Burgess, Darren J.
    NATURE REVIEWS GENETICS, 2011, 12 (02) : 78 - 79
  • [27] Developing context-specific next-generation sequencing policy
    Curnutte, Margaret Ann
    Frumovitz, Karen L.
    Bollinger, Juli M.
    Cook-Deegan, Robert M.
    McGuire, Amy L.
    Majumder, Mary A.
    NATURE BIOTECHNOLOGY, 2016, 34 (05) : 466 - 470
  • [28] Next-generation sequencing in epigenetics
    Zeschnigk, Michael
    Horsthemke, Bernhard
    MEDIZINISCHE GENETIK, 2019, 31 (02) : 205 - 211
  • [29] The chemistry of next-generation sequencing
    Raphaël Rodriguez
    Yamuna Krishnan
    Nature Biotechnology, 2023, 41 : 1709 - 1715
  • [30] Next-generation sequencing in the clinic
    Jason Y Park
    Larry J Kricka
    Paolo Fortina
    Nature Biotechnology, 2013, 31 : 990 - 992