An empirical Bayes method for genotyping and SNP detection using multi-sample next-generation sequencing data

被引:3
作者
Huang, Gongyi [1 ]
Wang, Shaoli [2 ,3 ]
Wang, Xueqin [1 ,4 ,5 ]
You, Na [1 ,4 ]
机构
[1] Sun Yat Sen Univ, Sch Math, Guangzhou 510275, Guangdong, Peoples R China
[2] Shanghai Univ Finance & Econ, Sch Stat & Management, Shanghai 200433, Peoples R China
[3] Shanghai Univ Finance & Econ, Shanghai Key Lab Financial Informat Technol, Shanghai 200433, Peoples R China
[4] Sun Yat Sen Univ, South China Ctr Stat Sci, Guangzhou 510275, Guangdong, Peoples R China
[5] Sun Yat Sen Univ, Zhongshan Sch Med, Guangzhou 510080, Guangdong, Peoples R China
基金
中国国家自然科学基金; 高等学校博士学科点专项科研基金;
关键词
MODEL SELECTION; DISCOVERY; FRAMEWORK; VARIANTS;
D O I
10.1093/bioinformatics/btw409
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The development of next generation sequencing technology provides an efficient and powerful approach to rare variant detection. To identify genetic variations, the essential question is how to quantity the sequencing error rate in the data. Because of the advantage of easy implementation and the ability to integrate data from different sources, the empirical Bayes method is popularly employed to estimate the sequencing error rate for SNP detection. Results: We propose a novel statistical model to fit the observed non-reference allele frequency data, and utilize the empirical Bayes method for both genotyping and SNP detection, where an ECM algorithm is implemented to estimate the model parameters. The performance of our proposed method is investigated via simulations and real data analysis. It is shown that our method makes less genotype-call errors, and with the parameter estimates from the ECM algorithm, it attains high detection power with FDR being well controlled.
引用
收藏
页码:3240 / 3245
页数:6
相关论文
共 22 条
[1]  
[Anonymous], 2010, I MATH STAT ONOGRAPH
[2]   Joint Variant and De Novo Mutation Identification on Pedigrees from High-Throughput Sequencing Data [J].
Cleary, John G. ;
Braithwaite, Ross ;
Gaastra, Kurt ;
Hilbush, Brian S. ;
Inglis, Stuart ;
Irvine, Sean A. ;
Jackson, Alan ;
Littin, Richard ;
Nohzadeh-Malakshah, Sahar ;
Rathod, Mehul ;
Ware, David ;
Trigg, Len ;
De La Vega, Francisco M. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2014, 21 (06) :405-419
[3]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[4]   Large-scale simultaneous hypothesis testing: The choice of a null hypothesis [J].
Efron, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2004, 99 (465) :96-104
[5]   Size, power and false discovery rates [J].
Efron, Bradley .
ANNALS OF STATISTICS, 2007, 35 (04) :1351-1377
[6]   Correlation and large-scale simultaneous significance testing [J].
Efron, Bradley .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (477) :93-103
[7]   SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors [J].
Goya, Rodrigo ;
Sun, Mark G. F. ;
Morin, Ryan D. ;
Leung, Gillian ;
Ha, Gavin ;
Wiegand, Kimberley C. ;
Senz, Janine ;
Crisan, Anamaria ;
Marra, Marco A. ;
Hirst, Martin ;
Huntsman, David ;
Murphy, Kevin P. ;
Aparicio, Sam ;
Shah, Sohrab P. .
BIOINFORMATICS, 2010, 26 (06) :730-736
[8]   SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples [J].
Le, Si Quang ;
Durbin, Richard .
GENOME RESEARCH, 2011, 21 (06) :952-960
[9]   Discovery of Rare Variants via Sequencing: Implications for the Design of Complex Trait Association Studies [J].
Li, Bingshan ;
Leal, Suzanne M. .
PLOS GENETICS, 2009, 5 (05)
[10]   The Sequence Alignment/Map format and SAMtools [J].
Li, Heng ;
Handsaker, Bob ;
Wysoker, Alec ;
Fennell, Tim ;
Ruan, Jue ;
Homer, Nils ;
Marth, Gabor ;
Abecasis, Goncalo ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (16) :2078-2079