From next-generation resequencing reads to a high-quality variant data set

被引:69
作者
Pfeifer, S. P. [1 ,2 ,3 ]
机构
[1] Ecole Polytech Fed Lausanne, Sch Life Sci, Lausanne, Switzerland
[2] Swiss Inst Bioinformat, Lausanne, Switzerland
[3] Arizona State Univ, Sch Life Sci, Tempe, AZ 85287 USA
关键词
ACCURATE ERROR-CORRECTION; SEQUENCING DATA; CALLING PIPELINES; GENOMIC SEQUENCE; ALIGNMENT; DISCOVERY; ADAPTER; TOOL; ALGORITHMS; FRAMEWORK;
D O I
10.1038/hdy.2016.102
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Sequencing has revolutionized biology by permitting the analysis of genomic variation at an unprecedented resolution. High-throughput sequencing is fast and inexpensive, making it accessible for a wide range of research topics. However, the produced data contain subtle but complex types of errors, biases and uncertainties that impose several statistical and computational challenges to the reliable detection of variants. To tap the full potential of high-throughput sequencing, a thorough understanding of the data produced as well as the available methodologies is required. Here, I review several commonly used methods for generating and processing next-generation resequencing data, discuss the influence of errors and biases together with their resulting implications for downstream analyses and provide general guidelines and recommendations for producing high-quality single-nucleotide polymorphism data sets from raw reads by highlighting several sophisticated reference-based methods representing the current state of the art.
引用
收藏
页码:111 / 124
页数:14
相关论文
共 50 条
[41]   A fast and accurate SNP detection algorithm for next-generation sequencing data [J].
Xu, Feng ;
Wang, Weixin ;
Wang, Panwen ;
Li, Mulin Jun ;
Sham, Pak Chung ;
Wang, Junwen .
NATURE COMMUNICATIONS, 2012, 3
[42]   A Review on The Processing and Analysis of Next-generation RNA-seq Data [J].
Wang Xi ;
Wang Xiao-Wo ;
Wang Li-Kun ;
Feng Zhi-Xing ;
Zhang Xue-Gong .
PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, 2010, 37 (08) :834-846
[43]   NGSView: an extensible open source editor for next-generation sequencing data [J].
Arner, Erik ;
Hayashizaki, Yoshihide ;
Daub, Carsten O. .
BIOINFORMATICS, 2010, 26 (01) :125-126
[44]   ReSeqTools: an integrated toolkit for large-scale next-generation sequencing based resequencing analysis [J].
He, W. ;
Zhao, S. ;
Liu, X. ;
Dong, S. ;
Lv, J. ;
Liu, D. ;
Wang, J. ;
Meng, Z. .
GENETICS AND MOLECULAR RESEARCH, 2013, 12 (04) :6275-6283
[45]   A vertebrate case study of the quality of assemblies derived from next-generation sequences [J].
Ye, Liang ;
Hillier, LaDeana W. ;
Minx, Patrick ;
Thane, Nay ;
Locke, Devin P. ;
Martin, John C. ;
Chen, Lei ;
Mitreva, Makedonka ;
Miller, Jason R. ;
Haub, Kevin V. ;
Dooling, David J. ;
Mardis, Elaine R. ;
Wilson, Richard K. ;
Weinstock, George M. ;
Warren, Wesley C. .
GENOME BIOLOGY, 2011, 12 (03)
[46]   Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads [J].
Jiang, Hongshan ;
Lei, Rong ;
Ding, Shou-Wei ;
Zhu, Shuifang .
BMC BIOINFORMATICS, 2014, 15
[47]   ESREEM: Efficient Short Reads Error Estimation Computational Model for Next-generation Genome Sequencing [J].
Tahir, Muhammad ;
Sardaraz, Muhammad ;
Mehmood, Zahid ;
Khan, Muhammad Saud .
CURRENT BIOINFORMATICS, 2021, 16 (02) :339-349
[48]   ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data [J].
Margarido, Gabriel R. A. ;
Heckerman, David .
PLOS COMPUTATIONAL BIOLOGY, 2015, 11 (04)
[49]   Detection and quantification of mitochondrial DNA deletions from next-generation sequence data [J].
Bosworth, Colleen M. ;
Grandhi, Sneha ;
Gould, Meetha P. ;
LaFramboise, Thomas .
BMC BIOINFORMATICS, 2017, 18
[50]   SeedsGraph: an efficient assembler for next-generation sequencing data [J].
Wang, Chunyu ;
Guo, Maozu ;
Liu, Xiaoyan ;
Liu, Yang ;
Zou, Quan .
BMC MEDICAL GENOMICS, 2015, 8