Joint Variant and De Novo Mutation Identification on Pedigrees from High-Throughput Sequencing Data

被引:56
作者
Cleary, John G. [1 ]
Braithwaite, Ross [1 ]
Gaastra, Kurt [1 ]
Hilbush, Brian S. [2 ]
Inglis, Stuart [3 ]
Irvine, Sean A. [1 ]
Jackson, Alan [1 ]
Littin, Richard [3 ]
Nohzadeh-Malakshah, Sahar [2 ]
Rathod, Mehul [1 ]
Ware, David [1 ]
Trigg, Len [1 ]
De La Vega, Francisco M. [2 ]
机构
[1] Real Time Genom Ltd, Hamilton, New Zealand
[2] Real Time Genom Inc, San Bruno, CA USA
[3] NetValue Ltd, Hamilton, New Zealand
关键词
Bayesian networks; indel; multiple-nucleotide polymorphism (MNP); next-generation sequencing; pedigree; single-nucleotide variant (SNV); trios; variant calling; STRUCTURAL VARIATION; DISCOVERY; RESOURCE; FORMAT; INDEL;
D O I
10.1089/cmb.2014.0029
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The analysis of whole-genome or exome sequencing data from trios and pedigrees has been successfully applied to the identification of disease-causing mutations. However, most methods used to identify and genotype genetic variants from next-generation sequencing data ignore the relationships between samples, resulting in significant Mendelian errors, false positives and negatives. Here we present a Bayesian network framework that jointly analyzes data from all members of a pedigree simultaneously using Mendelian segregation priors, yet providing the ability to detect de novo mutations in offspring, and is scalable to large pedigrees. We evaluated our method by simulations and analysis of whole-genome sequencing (WGS) data from a 17-individual, 3-generation CEPH pedigree sequenced to 50x average depth. Compared with singleton calling, our family caller produced more high-quality variants and eliminated spurious calls as judged by common quality metrics such as Ti/Tv, Het/Hom ratios, and dbSNP/SNP array data concordance, and by comparing to ground truth variant sets available for this sample. We identify all previously validated de novo mutations in NA12878, concurrent with a 7x precision improvement. Our results show that our method is scalable to large genomics and human disease studies.
引用
收藏
页码:405 / 419
页数:15
相关论文
共 29 条
[1]   Accurate and comprehensive sequencing of personal genomes [J].
Ajay, Subramanian S. ;
Parker, Stephen C. J. ;
Abaan, Hatice Ozel ;
Fajardo, Karin V. Fuentes ;
Margulies, Elliott H. .
GENOME RESEARCH, 2011, 21 (09) :1498-1505
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]  
[Anonymous], 2012, ARXIV12073907V2
[4]  
[Anonymous], 2012, Nature
[5]   A public resource facilitating clinical use of genomes [J].
Ball, Madeleine P. ;
Thakuria, Joseph V. ;
Zaranek, Alexander Wait ;
Clegg, Tom ;
Rosenbaum, Abraham M. ;
Wu, Xiaodi ;
Angrist, Misha ;
Bhak, Jong ;
Bobe, Jason ;
Callow, Matthew J. ;
Cano, Carlos ;
Chou, Michael F. ;
Chung, Wendy K. ;
Douglas, Shawn M. ;
Estep, Preston W. ;
Gore, Athurva ;
Hulick, Peter ;
Labarga, Alberto ;
Lee, Je-Hyuk ;
Lunshof, Jeantine E. ;
Kim, Byung Chul ;
Kim, Jong-Il ;
Li, Zhe ;
Murray, Michael F. ;
Nilsen, Geoffrey B. ;
Peters, Brock A. ;
Raman, Anugraha M. ;
Rienhoff, Hugh Y. ;
Robasky, Kimberly ;
Wheeler, Matthew T. ;
Vandewege, Ward ;
Vorhaus, Daniel B. ;
Yang, Joyce L. ;
Yang, Luhan ;
Aach, John ;
Ashley, Euan A. ;
Drmanac, Radoje ;
Kim, Seong-Jin ;
Li, Jin Billy ;
Peshkin, Leonid ;
Seidman, Christine E. ;
Seo, Jeong-Sun ;
Zhang, Kun ;
Rehm, Heidi L. ;
Church, George M. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (30) :11920-11927
[6]  
Breiman L., 2001, Learn, V45, P5
[7]   A Family-Based Probabilistic Method for Capturing De Novo Mutations from High-Throughput Short-Read Sequencing Data [J].
Cartwright, Reed A. ;
Hussin, Julie ;
Keebler, Jonathan E. M. ;
Stone, Eric A. ;
Awadalla, Philip .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2012, 11 (02)
[8]   Variation in genome-wide mutation rates within and between human families [J].
Conrad, Donald F. ;
Keebler, Jonathan E. M. ;
DePristo, Mark A. ;
Lindsay, Sarah J. ;
Zhang, Yujun ;
Casals, Ferran ;
Idaghdour, Youssef ;
Hartl, Chris L. ;
Torroja, Carlos ;
Garimella, Kiran V. ;
Zilversmit, Martine ;
Cartwright, Reed ;
Rouleau, Guy A. ;
Daly, Mark ;
Stone, Eric A. ;
Hurles, Matthew E. ;
Awadalla, Philip .
NATURE GENETICS, 2011, 43 (07) :712-U137
[9]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158
[10]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+