From next-generation resequencing reads to a high-quality variant data set

被引:61
|
作者
Pfeifer, S. P. [1 ,2 ,3 ]
机构
[1] Ecole Polytech Fed Lausanne, Sch Life Sci, Lausanne, Switzerland
[2] Swiss Inst Bioinformat, Lausanne, Switzerland
[3] Arizona State Univ, Sch Life Sci, Tempe, AZ 85287 USA
关键词
ACCURATE ERROR-CORRECTION; SEQUENCING DATA; CALLING PIPELINES; GENOMIC SEQUENCE; ALIGNMENT; DISCOVERY; ADAPTER; TOOL; ALGORITHMS; FRAMEWORK;
D O I
10.1038/hdy.2016.102
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Sequencing has revolutionized biology by permitting the analysis of genomic variation at an unprecedented resolution. High-throughput sequencing is fast and inexpensive, making it accessible for a wide range of research topics. However, the produced data contain subtle but complex types of errors, biases and uncertainties that impose several statistical and computational challenges to the reliable detection of variants. To tap the full potential of high-throughput sequencing, a thorough understanding of the data produced as well as the available methodologies is required. Here, I review several commonly used methods for generating and processing next-generation resequencing data, discuss the influence of errors and biases together with their resulting implications for downstream analyses and provide general guidelines and recommendations for producing high-quality single-nucleotide polymorphism data sets from raw reads by highlighting several sophisticated reference-based methods representing the current state of the art.
引用
收藏
页码:111 / 124
页数:14
相关论文
共 50 条
  • [1] From next-generation resequencing reads to a high-quality variant data set
    S P Pfeifer
    Heredity, 2017, 118 : 111 - 124
  • [2] A SNP discovery method to assess variant allele probability from next-generation resequencing data
    Shen, Yufeng
    Wan, Zhengzheng
    Coarfa, Cristian
    Drabek, Rafal
    Chen, Lei
    Ostrowski, Elizabeth A.
    Liu, Yue
    Weinstock, George M.
    Wheeler, David A.
    Gibbs, Richard A.
    Yu, Fuli
    GENOME RESEARCH, 2010, 20 (02) : 273 - 280
  • [3] Alignment of Next-Generation Sequencing Reads
    Reinert, Knut
    Langmead, Ben
    Weese, David
    Evers, Dirk J.
    ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 16, 2015, 16 : 133 - 151
  • [4] QcReads:An Adapter and Quality Trimming Tool for Next-Generation Sequencing Reads
    Yunfei Ma
    Haibing Xie
    Xuman Han
    David M.Irwin
    Ya-Ping Zhang
    JournalofGeneticsandGenomics, 2013, 40 (12) : 639 - 642
  • [5] Preparation of high-quality next-generation sequencing libraries from picogram quantities of target DNA
    Parkinson, Nicholas J.
    Maslau, Siarhei
    Ferneyhough, Ben
    Zhang, Gang
    Gregory, Lorna
    Buck, David
    Ragoussis, Jiannis
    Ponting, Chris P.
    Fischer, Michael D.
    GENOME RESEARCH, 2012, 22 (01) : 125 - 133
  • [6] Next-generation Data Hub Technology for a Data-centric Society through High-quality High-reliability Data Distribution
    Mochida S.
    Nagata T.
    NTT Technical Review, 2021, 19 (02): : 47 - 52
  • [7] Genotype calling from next-generation sequencing data using haplotype information of reads
    Zhi, Degui
    Wu, Jihua
    Liu, Nianjun
    Zhang, Kui
    BIOINFORMATICS, 2012, 28 (07) : 938 - 946
  • [8] QcReads: An Adapter and Quality Trimming Tool for Next-Generation Sequencing Reads
    Ma, Yunfei
    Xie, Haibing
    Han, Xuman
    Irwin, David M.
    Zhang, Ya-Ping
    JOURNAL OF GENETICS AND GENOMICS, 2013, 40 (12) : 639 - 642
  • [9] Whole Genome Complete Resequencing of Bacillus subtilis Natto by Combining Long Reads with High-Quality Short Reads
    Kamada, Mayumi
    Hase, Sumitaka
    Sato, Kengo
    Toyoda, Atsushi
    Fujiyama, Asao
    Sakakibara, Yasubumi
    PLOS ONE, 2014, 9 (10):
  • [10] Consensus Rules in Variant Detection from Next-Generation Sequencing Data
    Jia, Peilin
    Li, Fei
    Xia, Jufeng
    Chen, Haiquan
    Ji, Hongbin
    Pao, William
    Zhao, Zhongming
    PLOS ONE, 2012, 7 (06):