A beginners guide to SNP calling from high-throughput DNA-sequencing data

被引:66
作者
Altmann, Andre [1 ]
Weber, Peter [1 ]
Bader, Daniel [1 ]
Preuss, Michael [2 ]
Binder, Elisabeth B. [1 ]
Mueller-Myhsok, Bertram [1 ]
机构
[1] Max Planck Inst Psychiat, D-80804 Munich, Germany
[2] Univ Lubeck, Inst Med Biometrie & Stat, Lubeck, Germany
关键词
GENOME-WIDE ASSOCIATION; RNA-SEQ; GENERATION; ALIGNMENT; BROWSER; DISCOVERY; VARIANTS; GENOTYPE; MOLECULE; FORMAT;
D O I
10.1007/s00439-012-1213-z
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results.
引用
收藏
页码:1541 / 1554
页数:14
相关论文
共 54 条
[1]   GenomeView: a next-generation genome browser [J].
Abeel, Thomas ;
Van Parys, Thomas ;
Saeys, Yvan ;
Galagan, James ;
Van de Peer, Yves .
NUCLEIC ACIDS RESEARCH, 2012, 40 (02) :e12
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]   Statistical analysis strategies for association studies involving rare variants [J].
Bansal, Vikas ;
Libiger, Ondrej ;
Torkamani, Ali ;
Schork, Nicholas J. .
NATURE REVIEWS GENETICS, 2010, 11 (11) :773-785
[4]   High-resolution profiling of histone methylations in the human genome [J].
Barski, Artern ;
Cuddapah, Suresh ;
Cui, Kairong ;
Roh, Tae-Young ;
Schones, Dustin E. ;
Wang, Zhibin ;
Wei, Gang ;
Chepelev, Iouri ;
Zhao, Keji .
CELL, 2007, 129 (04) :823-837
[5]   Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies [J].
Browning, Brian L. ;
Yu, Zhaoxia .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 85 (06) :847-861
[6]  
Burrows M, 1994, ORAL HLTH STATUS ORA
[7]  
Clarke J, 2009, NAT NANOTECHNOL, V4, P265, DOI [10.1038/NNANO.2009.12, 10.1038/nnano.2009.12]
[8]   The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants [J].
Cock, Peter J. A. ;
Fields, Christopher J. ;
Goto, Naohisa ;
Heuer, Michael L. ;
Rice, Peter M. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (06) :1767-1771
[9]   Multiple rare Alleles contribute to low plasma levels of HDL cholesterol [J].
Cohen, JC ;
Kiss, RS ;
Pertsemlidis, A ;
Marcel, YL ;
McPherson, R ;
Hobbs, HH .
SCIENCE, 2004, 305 (5685) :869-872
[10]   Finishing the euchromatic sequence of the human genome [J].
Collins, FS ;
Lander, ES ;
Rogers, J ;
Waterston, RH .
NATURE, 2004, 431 (7011) :931-945