Empirical Bayes single nucleotide variant-calling for next-generation sequencing data

被引:0
作者
Karimnezhad, Ali [1 ,2 ]
Perkins, Theodore J. [3 ,4 ]
机构
[1] Univ Ottawa, Dept Math & Stat, Ottawa, ON K1N 9A7, Canada
[2] Bur Food Surveillance & Sci Integrat, Biostat & Risk Modelling Div, Hlth Prod & Food Branch, Food Directorate, Ottawa, ON K1A 0K9, Canada
[3] Ottawa Hosp Res Inst, Regenerat Med Program, Ottawa, ON K1H 8L6, Canada
[4] Univ Ottawa, Dept Biochem Microbiol & Immunol, Ottawa, ON K1H 8M5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
SNP DETECTION; CANCER; DISCOVERY;
D O I
10.1038/s41598-024-51958-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
One of the fundamental computational problems in cancer genomics is the identification of single nucleotide variants (SNVs) from DNA sequencing data. Many statistical models and software implementations for SNV calling have been developed in the literature, yet, they still disagree widely on real datasets. Based on an empirical Bayesian approach, we introduce a local false discovery rate (LFDR) estimator for germline SNV calling. Our approach learns model parameters without prior information, and simultaneously accounts for information across all sites in the genomic regions of interest. We also propose another LFDR-based algorithm that reliably prioritizes a given list of mutations called by any other variant-calling algorithm. We use a suite of gold-standard cell line data to compare our LFDR approach against a collection of widely used, state of the art programs. We find that our LFDR approach approximately matches or exceeds the performance of all of these programs, despite some very large differences among them. Furthermore, when prioritizing other algorithms' calls by our LFDR score, we find that by manipulating the type I-type II tradeoff we can select subsets of variant calls with minimal loss of sensitivity but dramatic increases in precision.
引用
收藏
页数:14
相关论文
共 23 条
[1]   Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples [J].
Cibulskis, Kristian ;
Lawrence, Michael S. ;
Carter, Scott L. ;
Sivachenko, Andrey ;
Jaffe, David ;
Sougnez, Carrie ;
Gabriel, Stacey ;
Meyerson, Matthew ;
Lander, Eric S. ;
Getz, Gad .
NATURE BIOTECHNOLOGY, 2013, 31 (03) :213-219
[2]   Single-sample SNP detection by empirical Bayes method using next generation sequencing data [J].
Ding, Weijie ;
Kou, Qiang ;
Wang, Xueqin ;
Xu, Qiuya ;
You, Na .
STATISTICS AND ITS INTERFACE, 2015, 8 (04) :457-462
[3]   Pisces: An Accurate and Versatile Single Sample Somatic and Germline Variant Caller [J].
Dunn, Tamsen ;
Berry, Gwenn ;
Emig-Agius, Dorothea ;
Jiang, Yu ;
Iyer, Anita ;
Udar, Nitin ;
Stromberg, Michael .
ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, :595-595
[4]   A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree [J].
Eberle, Michael A. ;
Fritzilas, Epameinondas ;
Krusche, Peter ;
Kallberg, Morten ;
Moore, Benjamin L. ;
Bekritsky, Mitchell A. ;
Iqbal, Zamin ;
Chuang, Han-Yu ;
Humphray, Sean J. ;
Halpern, Aaron L. ;
Kruglyak, Semyon ;
Margulies, Elliott H. ;
McVean, Gil ;
Bentley, David R. .
GENOME RESEARCH, 2017, 27 (01) :157-164
[5]   Empirical Bayes analysis of a microarray experiment [J].
Efron, B ;
Tibshirani, R ;
Storey, JD ;
Tusher, V .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1151-1160
[6]  
Efron B., 2012, LARGE SCALE INFERENC
[7]   An empirical Bayes method for genotyping and SNP detection using multi-sample next-generation sequencing data [J].
Huang, Gongyi ;
Wang, Shaoli ;
Wang, Xueqin ;
You, Na .
BIOINFORMATICS, 2016, 32 (21) :3240-3245
[8]   A simple yet efficient method of local false discovery rate estimation designed for genome-wide association data analysis [J].
Karimnezhad, Ali .
STATISTICAL METHODS AND APPLICATIONS, 2022, 31 (01) :159-180
[9]   Accuracy and reproducibility of somatic point mutation calling in clinical-type targeted sequencing data [J].
Karimnezhad, Ali ;
Palidwor, Gareth A. ;
Thavorn, Kednapa ;
Stewart, David J. ;
Campbell, Pearl A. ;
Lo, Bryan ;
Perkins, Theodore J. .
BMC MEDICAL GENOMICS, 2020, 13 (01)
[10]   Incorporating Prior Knowledge about Genetic Variants into the Analysis of Genetic Association Data: An Empirical Bayes Approach [J].
Karimnezhad, Ali ;
Bickel, David R. .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (02) :635-646