Accurate detection and genotyping of SNPs utilizing population sequencing data

被引:78
作者
Bansal, Vikas [1 ]
Harismendy, Olivier [1 ]
Tewhey, Ryan [1 ]
Murray, Sarah S. [1 ]
Schork, Nicholas J. [1 ]
Topol, Eric J. [1 ]
Frazer, Kelly A. [1 ]
机构
[1] Scripps Res Inst, Scripps Translat Sci Inst, Scripps Genom Med, La Jolla, CA 92037 USA
关键词
SHORT READ ALIGNMENT; HUMAN GENOME; RARE VARIANTS; CONTRIBUTE; IMPUTATION; ULTRAFAST; GENES; SETS;
D O I
10.1101/gr.100040.109
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Next-generation sequencing technologies have made it possible to sequence targeted regions of the human genome in hundreds of individuals. Deep sequencing represents a powerful approach for the discovery of the complete spectrum of DNA sequence variants in functionally important genomic intervals. Current methods for single nucleotide polymorphism (SNP) detection are designed to detect SNPs from single individual sequence data sets. Here, we describe a novel method SNIP-Seq (single nucleotide polymorphism identification from population sequence data) that leverages sequence data from a population of individuals to detect SNPs and assign genotypes to individuals. To evaluate our method, we utilized sequence data from a 200-kilobase (kb) region on chromosome 9p21 of the human genome. This region was sequenced in 48 individuals (five sequenced in duplicate) using the Illumina GA platform. Using this data set, we demonstrate that our method is highly accurate for detecting variants and can filter out false SNPs that are attributable to sequencing errors. The concordance of sequencing- based genotype assignments between duplicate samples was 98.8%. The 200-kb region was independently sequenced to a high depth of coverage using two sequence pools containing the 48 individuals. Many of the novel SNPs identified by SNIP-Seq from the individual sequencing were validated by the pooled sequencing data and were subsequently confirmed by Sanger sequencing. We estimate that SNIP-Seq achieves a low false-positive rate of similar to 2%, improving upon the higher false-positive rate for existing methods that do not utilize population sequence data. Collectively, these results suggest that analysis of population sequencing data is a powerful approach for the accurate detection of SNPs and the assignment of genotypes to individual samples.
引用
收藏
页码:537 / 545
页数:9
相关论文
共 28 条
  • [21] Rare Variants of IFIH1, a Gene Implicated in Antiviral Responses, Protect Against Type 1 Diabetes
    Nejentsev, Sergey
    Walker, Neil
    Riches, David
    Egholm, Michael
    Todd, John A.
    [J]. SCIENCE, 2009, 324 (5925) : 387 - 389
  • [22] Microarray-based genomic selection for high-throughput resequencing
    Okou, David T.
    Steinberg, Karyn Meltz
    Middle, Christina
    Cutler, David J.
    Albert, Thomas J.
    Zwick, Michael E.
    [J]. NATURE METHODS, 2007, 4 (11) : 907 - 909
  • [23] Multiplex amplification of large sets of human exons
    Porreca, Gregory J.
    Zhang, Kun
    Li, Jin Billy
    Xie, Bin
    Austin, Derek
    Vassallo, Sara L.
    LeProust, Emily M.
    Peck, Bill J.
    Emig, Christopher J.
    Dahl, Fredrik
    Gao, Yuan
    Church, George M.
    Shendure, Jay
    [J]. NATURE METHODS, 2007, 4 (11) : 931 - 936
  • [24] Imputation-based analysis of association studies: Candidate regions and quantitative traits
    Servin, Bertrand
    Stephens, Matthew
    [J]. PLOS GENETICS, 2007, 3 (07): : 1296 - 1308
  • [25] Next-generation DNA sequencing
    Shendure, Jay
    Ji, Hanlee
    [J]. NATURE BIOTECHNOLOGY, 2008, 26 (10) : 1135 - 1145
  • [26] Methods for Genomic Partitioning
    Turner, Emily H.
    Ng, Sarah B.
    Nickerson, Deborah A.
    Shendure, Jay
    [J]. ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, 2009, 10 : 263 - 284
  • [27] The diploid genome sequence of an Asian individual
    Wang, Jun
    Wang, Wei
    Li, Ruiqiang
    Li, Yingrui
    Tian, Geng
    Goodman, Laurie
    Fan, Wei
    Zhang, Junqing
    Li, Jun
    Zhang, Juanbin
    Guo, Yiran
    Feng, Binxiao
    Li, Heng
    Lu, Yao
    Fang, Xiaodong
    Liang, Huiqing
    Du, Zhenglin
    Li, Dong
    Zhao, Yiqing
    Hu, Yujie
    Yang, Zhenzhen
    Zheng, Hancheng
    Hellmann, Ines
    Inouye, Michael
    Pool, John
    Yi, Xin
    Zhao, Jing
    Duan, Jinjie
    Zhou, Yan
    Qin, Junjie
    Ma, Lijia
    Li, Guoqing
    Yang, Zhentao
    Zhang, Guojie
    Yang, Bin
    Yu, Chang
    Liang, Fang
    Li, Wenjie
    Li, Shaochuan
    Li, Dawei
    Ni, Peixiang
    Ruan, Jue
    Li, Qibin
    Zhu, Hongmei
    Liu, Dongyuan
    Lu, Zhike
    Li, Ning
    Guo, Guangwu
    Zhang, Jianguo
    Ye, Jia
    [J]. NATURE, 2008, 456 (7218) : 60 - U1
  • [28] The complete genome of an individual by massively parallel DNA sequencing
    Wheeler, David A.
    Srinivasan, Maithreyan
    Egholm, Michael
    Shen, Yufeng
    Chen, Lei
    McGuire, Amy
    He, Wen
    Chen, Yi-Ju
    Makhijani, Vinod
    Roth, G. Thomas
    Gomes, Xavier
    Tartaro, Karrie
    Niazi, Faheem
    Turcotte, Cynthia L.
    Irzyk, Gerard P.
    Lupski, James R.
    Chinault, Craig
    Song, Xing-zhi
    Liu, Yue
    Yuan, Ye
    Nazareth, Lynne
    Qin, Xiang
    Muzny, Donna M.
    Margulies, Marcel
    Weinstock, George M.
    Gibbs, Richard A.
    Rothberg, Jonathan M.
    [J]. NATURE, 2008, 452 (7189) : 872 - U5