Dynamic model based algorithms for screening and genotyping over 100K SNPs on oligonucleotide microarrays

被引:139
作者
Di, XJ [1 ]
Matsuzaki, H [1 ]
Webster, TA [1 ]
Hubbell, E [1 ]
Liu, GY [1 ]
Dong, SL [1 ]
Bartell, D [1 ]
Huang, J [1 ]
Chiles, R [1 ]
Yang, G [1 ]
Shen, MM [1 ]
Kulp, D [1 ]
Kennedy, GC [1 ]
Mei, R [1 ]
Jones, KW [1 ]
Cawley, S [1 ]
机构
[1] Affymetrix Inc, Santa Clara, CA 95051 USA
关键词
D O I
10.1093/bioinformatics/bti275
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A high density of single nucleotide polymorphism (SNP) coverage on the genome is desirable and often an essential requirement for population genetics studies. Region-specific or chromosome-specific linkage studies also benefit from the availability of as many high quality SNPs as possible. The availability of millions of SNPs from both Perlegen and the public domain and the development of an efficient microarray-based assay for genotyping SNPs has brought up some interesting analytical challenges. Effective methods for the selection of optimal subsets of SNPs spanning the genome and methods for accurately calling genotypes from probe hybridization patterns have enabled the development of a new microarray-based system for robustly genotyping over 100 000 SNPs per sample. Results: We introduce a new dynamic model-based algorithm (DM) for screening over 3 million SNPs and genotyping over 100 000 SNPs. The model is based on four possible underlying states: Null, A, AB and B for each probe quartet. We calculate a probe-level log likelihood for each model and then select between the four competing models with an SNP-level statistical aggregation across multiple probe quartets to provide a high-quality genotype call along with a quality measure of the call. We assess performance with HapMap reference genotypes, informative Mendelian inheritance relationship in families, and consistency between DM and another genotype classification method. At a call rate of 95.91% the concordance with reference genotypes from the HapMap Project is 99.81% based on over 1.5 million genotypes, the Mendelian error rate is 0.018% based on 10 trios, and the consistency between DM and MPAM is 99.90% at a comparable rate of 97.18%. We also develop methods for SNP selection and optimal probe selection.
引用
收藏
页码:1958 / 1963
页数:6
相关论文
共 22 条
  • [1] The essence of SNPs
    Brookes, AJ
    [J]. GENE, 1999, 234 (02) : 177 - 186
  • [2] High-throughput variation detection and genotyping using microarrays
    Cutler, DJ
    Zwick, ME
    Carrasquillo, MM
    Yohn, CT
    Tobin, KP
    Kashuk, C
    Mathews, DJ
    Shah, NA
    Eichler, EE
    Warrington, JA
    Chakravarti, A
    [J]. GENOME RESEARCH, 2001, 11 (11) : 1913 - 1925
  • [3] Flexible use of high-density oligonucleotide arrays for single-nucleotide polymorphism discovery and validation
    Dong, SL
    Wang, E
    Hsie, L
    Cao, YX
    Chen, XG
    Gingeras, TR
    [J]. GENOME RESEARCH, 2001, 11 (08) : 1418 - 1424
  • [4] Parallel genotyping of human SNPs using generic high-density oligonucleotide tag arrays
    Fan, JB
    Chen, XQ
    Halushka, MK
    Berno, A
    Huang, XH
    Ryder, T
    Lipshutz, RJ
    Lockhart, DJ
    Chakravarti, A
    [J]. GENOME RESEARCH, 2000, 10 (06) : 853 - 860
  • [5] MULTIPLEXED BIOCHEMICAL ASSAYS WITH BIOLOGICAL CHIPS
    FODOR, SPA
    RAVA, RP
    HUANG, XHC
    PEASE, AC
    HOLMES, CP
    ADAMS, CL
    [J]. NATURE, 1993, 364 (6437) : 555 - 556
  • [6] LIGHT-DIRECTED, SPATIALLY ADDRESSABLE PARALLEL CHEMICAL SYNTHESIS
    FODOR, SPA
    READ, JL
    PIRRUNG, MC
    STRYER, L
    LU, AT
    SOLAS, D
    [J]. SCIENCE, 1991, 251 (4995) : 767 - 773
  • [7] The International HapMap Project
    Gibbs, RA
    Belmont, JW
    Hardenbol, P
    Willis, TD
    Yu, FL
    Yang, HM
    Ch'ang, LY
    Huang, W
    Liu, B
    Shen, Y
    Tam, PKH
    Tsui, LC
    Waye, MMY
    Wong, JTF
    Zeng, CQ
    Zhang, QR
    Chee, MS
    Galver, LM
    Kruglyak, S
    Murray, SS
    Oliphant, AR
    Montpetit, A
    Hudson, TJ
    Chagnon, F
    Ferretti, V
    Leboeuf, M
    Phillips, MS
    Verner, A
    Kwok, PY
    Duan, SH
    Lind, DL
    Miller, RD
    Rice, JP
    Saccone, NL
    Taillon-Miller, P
    Xiao, M
    Nakamura, Y
    Sekine, A
    Sorimachi, K
    Tanaka, T
    Tanaka, Y
    Tsunoda, T
    Yoshino, E
    Bentley, DR
    Deloukas, P
    Hunt, S
    Powell, D
    Altshuler, D
    Gabriel, SB
    Qiu, RZ
    [J]. NATURE, 2003, 426 (6968) : 789 - 796
  • [8] Hollander M., 1999, Nonparametric Statistical Methods
  • [9] HUBBELL E, 2004, P RECOMB 2004 SAN DI
  • [10] Large-scale genotyping of complex DNA
    Kennedy, GC
    Matsuzaki, H
    Dong, SL
    Liu, WM
    Huang, J
    Liu, GY
    Xu, X
    Cao, MQ
    Chen, WW
    Zhang, J
    Liu, WW
    Yang, G
    Di, XJ
    Ryder, T
    He, ZJ
    Surti, U
    Phillips, MS
    Boyce-Jacino, MT
    Fodor, SPA
    Jones, KW
    [J]. NATURE BIOTECHNOLOGY, 2003, 21 (10) : 1233 - 1237