Evaluation of Allele Frequency Estimation Using Pooled Sequencing Data Simulation

被引:11
作者
Guo, Yan [1 ]
Samuels, David C. [2 ]
Li, Jiang [1 ]
Clark, Travis [3 ]
Li, Chung-I [1 ]
Shyr, Yu [1 ]
机构
[1] Vanderbilt Ingram Canc Ctr, Ctr Quantitat Sci, Nashville, TN USA
[2] Vanderbilt Univ, Med Ctr, Ctr Human Genet Res, Nashville, TN USA
[3] Vanderbilt Univ, VANTAGE, Nashville, TN USA
来源
SCIENTIFIC WORLD JOURNAL | 2013年
关键词
GENOME-WIDE ASSOCIATION; LINKAGE DISEQUILIBRIUM; SYNDROME LOCUS; RARE VARIANTS; HUMAN-DISEASE; DNA SAMPLES; GENES; IDENTIFICATION; POPULATIONS; MUTATIONS;
D O I
10.1155/2013/895496
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Next-generation sequencing (NGS) technology has provided researchers with opportunities to study the genome in unprecedented detail. In particular, NGS is applied to disease association studies. Unlike genotyping chips, NGS is not limited to a fixed set of SNPs. Prices for NGS are now comparable to the SNP chip, although for large studies the cost can be substantial. Pooling techniques are often used to reduce the overall cost of large-scale studies. In this study, we designed a rigorous simulation model to test the practicability of estimating allele frequency from pooled sequencing data. We took crucial factors into consideration, including pool size, overall depth, average depth per sample, pooling variation, and sampling variation. We used real data to demonstrate and measure reference allele preference in DNAseq data and implemented this bias in our simulation model. We found that pooled sequencing data can introduce high levels of relative error rate (defined as error rate divided by targeted allele frequency) and that the error rate is more severe for low minor allele frequency SNPs than for high minor allele frequency SNPs. In order to overcome the error introduced by pooling, we recommend a large pool size and high average depth per sample.
引用
收藏
页数:9
相关论文
共 33 条
  • [31] Allele frequency distributions in pooled DNA samples: Applications to mapping complex disease genes
    Shaw, SH
    Carrasquillo, MM
    Kashuk, C
    Puffenberger, EG
    Chakravarti, A
    [J]. GENOME RESEARCH, 1998, 8 (02): : 111 - 123
  • [32] IDENTIFICATION OF A BARDET-BIEDL-SYNDROME LOCUS ON CHROMOSOME-3 AND EVALUATION OF AN EFFICIENT APPROACH TO HOMOZYGOSITY MAPPING
    SHEFFIELD, VC
    CARMI, R
    KWITEKBLACK, A
    ROKHLINA, T
    NISHIMURA, D
    DUYK, GM
    ELBEDOUR, K
    SUNDEN, SL
    STONE, EM
    [J]. HUMAN MOLECULAR GENETICS, 1994, 3 (08) : 1331 - 1335
  • [33] Genome-wide association and linkage identify modifier loci of lung disease severity in cystic fibrosis at 11p13 and 20q13.2
    Wright, Fred A.
    Strug, Lisa J.
    Doshi, Vishal K.
    Commander, Clayton W.
    Blackman, Scott M.
    Sun, Lei
    Berthiaume, Yves
    Cutler, David
    Cojocaru, Andreea
    Collaco, J. Michael
    Corey, Mary
    Dorfman, Ruslan
    Goddard, Katrina
    Green, Deanna
    Kent, Jack W., Jr.
    Lange, Ethan M.
    Lee, Seunggeun
    Li, Weili
    Luo, Jingchun
    Mayhew, Gregory M.
    Naughton, Kathleen M.
    Pace, Rhonda G.
    Pare, Peter
    Rommens, Johanna M.
    Sandford, Andrew
    Stonebraker, Jaclyn R.
    Sun, Wei
    Taylor, Chelsea
    Vanscoy, Lori L.
    Zou, Fei
    Blangero, John
    Zielenski, Julian
    O'Neal, Wanda K.
    Drumm, Mitchell L.
    Durie, Peter R.
    Knowles, Michael R.
    Cutting, Garry R.
    [J]. NATURE GENETICS, 2011, 43 (06) : 539 - U67