Blacklisting variants common in private cohorts but not in public databases optimizes human exome analysis

被引:47
作者
Maffucci, Patrick [1 ,2 ,3 ]
Bigio, Benedetta [1 ,4 ,5 ]
Rapaport, Franck [1 ]
Cobat, Aurelie [4 ,5 ]
Borghesi, Alessandro [6 ]
Lopez, Marie [7 ,8 ,9 ]
Pating, Etienne [7 ,8 ,9 ]
Bolze, Alexandre [10 ]
Shang, Lei [1 ]
Bendavid, Matthieu [1 ]
Scott, Eric M. [11 ]
Stenson, Peter D. [12 ]
Cunningham-Rundles, Charlotte [2 ,3 ]
Cooper, David N. [12 ]
Gleeson, Joseph G. [11 ,13 ]
Fellay, Jacques [6 ]
Quintana-Murci, Lluis [7 ,8 ,9 ]
Casanova, Jean-Laurent [1 ,4 ,5 ,13 ,14 ]
Abel, Laurent [1 ,4 ,5 ]
Boisson, Bertrand [1 ,4 ,5 ]
Itan, Yuval [1 ,15 ,16 ]
机构
[1] Rockefeller Univ, Rockefeller Branch, St Giles Lab Human Genet Infect Dis, New York, NY 10065 USA
[2] Icahn Sch Med Mt Sinai, Grad Sch, Immunol Inst, New York, NY 10029 USA
[3] Icahn Sch Med Mt Sinai, Dept Med, Div Clin Immunol, New York, NY 10029 USA
[4] Necker Hosp Sick Children, INSERM U1163, Necker Branch, Lab Human Genet Infect Dis, F-75015 Paris, France
[5] Paris Descartes Univ, Imagine Inst, F-75015 Paris, France
[6] Ecole Polytech Fed Lausanne, Sch Life Sci, CH-1015 Lausanne, Switzerland
[7] Pasteur Inst, Human Evolutionary Genet Unit, F-75015 Paris, France
[8] CNRS UMR2000, F-75015 Paris, France
[9] Pasteur Inst, Ctr Bioinformat Biostat & Integrat Biol, F-75015 Paris, France
[10] Helix, San Carlos, CA 94070 USA
[11] Univ Calif San Diego, Dept Neurosci, Rady Childrens Inst Genom Med, La Jolla, CA 92093 USA
[12] Cardiff Univ, Sch Med, Inst Med Genet, Cardiff CF14 4XW, S Glam, Wales
[13] Howard Hughes Med Inst, New York, NY 10065 USA
[14] Necker Hosp Sick Children, Pediat Hematol Immunol Unit, F-75015 Paris, France
[15] Icahn Sch Med Mt Sinai, Charles Bronfman Inst Personalized Med, New York, NY 10029 USA
[16] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, New York, NY 10029 USA
基金
美国国家卫生研究院;
关键词
exome; variant; blacklist; WES analysis; WES annotation; GENE MUTATION DATABASE; GENOME; MONONUCLEOTIDE; GUIDELINES; FRAMEWORK; DISEASE; ERRORS;
D O I
10.1073/pnas.1808403116
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Computational analyses of human patient exomes aim to filter out as many nonpathogenic genetic variants (NPVs) as possible, without removing the true disease-causing mutations. This involves comparing the patient's exome with public databases to remove reported variants inconsistent with disease prevalence, mode of inheritance, or clinical penetrance. However, variants frequent in a given exome cohort, but absent or rare in public databases, have also been reported and treated as NPVs, without rigorous exploration. We report the generation of a blacklist of variants frequent within an in-house cohort of 3,104 exomes. This blacklist did not remove known pathogenic mutations from the exomes of 129 patients and decreased the number of NPVs remaining in the 3,104 individual exomes by a median of 62%. We validated this approach by testing three other independent cohorts of 400, 902, and 3,869 exomes. The blacklist generated from any given cohort removed a substantial proportion of NPVs (11-65%). We analyzed the blacklisted variants computationally and experimentally. Most of the blacklisted variants corresponded to false signals generated by incomplete reference genome assembly, location in low-complexity regions, bioinformatic misprocessing, or limitations inherent to cohort-specific private alleles (e.g., due to sequencing kits, and genetic ancestries). Finally, we provide our precalculated blacklists, together with ReFiNE, a program for generating customized blacklists from any medium-sized or large in-house cohort of exome (or other next-generation sequencing) data via a user-friendly public web server. This work demonstrates the power of extracting variant blacklists from private databases as a specific in-house but broadly applicable tool for optimizing exome analysis.
引用
收藏
页码:950 / 959
页数:10
相关论文
共 48 条
[1]   Life-threatening infectious diseases of childhood: single-gene inborn errors of immunity? [J].
Alcais, Alexandre ;
Quintana-Murci, Lluis ;
Thaler, David S. ;
Schurr, Erwin ;
Abel, Laurent ;
Casanova, Jean-Laurent .
YEAR IN HUMAN AND MEDICAL GENETICS: NEW TRENDS IN MENDELIAN GENETICS, 2010, 1214 :18-33
[2]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[3]   Severe viral respiratory infections in children with IFIH1 loss-of-function mutations [J].
Asgari, Samira ;
Schlapbach, Luregn J. ;
Anchisi, Stephanie ;
Hammer, Christian ;
Bartha, Istvan ;
Junier, Thomas ;
Mottet-Osman, Genevieve ;
Posfay-Barbe, Klara M. ;
Longchamp, David ;
Stocker, Martin ;
Cordey, Samuel ;
Kaiser, Laurent ;
Riedel, Thomas ;
Kenna, Tony ;
Long, Deborah ;
Schibler, Andreas ;
Telenti, Amalio ;
Tapparel, Caroline ;
McLaren, Paul J. ;
Garcin, Dominique ;
Fellay, Jacques .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (31) :8342-8347
[4]   Exome Sequencing Reveals Primary Immunodeficiencies in Children with Community-Acquired Pseudomonas aeruginosa Sepsis [J].
Asgari, Samira ;
McLaren, Paul J. ;
Peake, Jane ;
Wong, Melanie ;
Wong, Richard ;
Bartha, Istvan ;
Francis, Joshua R. ;
Abarca, Katia ;
Gelderman, Kyra A. ;
Agyeman, Philipp ;
Aebi, Christoph ;
Berger, Christoph ;
Fellay, Jacques ;
Schlapbach, Luregn J. .
FRONTIERS IN IMMUNOLOGY, 2016, 7
[5]   Review of Current Methods, Applications, and Data Management for the Bioinformatics Analysis of Whole Exome Sequencing [J].
Bao, Riyue ;
Huang, Lei ;
Andrade, Jorge ;
Tan, Wei ;
Kibbe, Warren A. ;
Jiang, Hongmei ;
Feng, Gang .
CANCER INFORMATICS, 2014, 13 :67-82
[6]   jvenn: an interactive Venn diagram viewer [J].
Bardou, Philippe ;
Mariette, Jerome ;
Escudie, Frederic ;
Djemiel, Christophe ;
Klopp, Christophe .
BMC BIOINFORMATICS, 2014, 15
[7]   Whole-exome sequencing to analyze population structure, parental inbreeding, and familial linkage [J].
Belkadi, Aziz ;
Pedergnana, Vincent ;
Cobat, Aurelie ;
Itan, Yuval ;
Vincent, Quentin B. ;
Abhyankar, Avinash ;
Shang, Lei ;
El Baghdadi, Jamila ;
Bousfiha, Aziz ;
Alcais, Alexandre ;
Boisson, Bertrand ;
Casanova, Jean-Laurent ;
Abel, Laurent .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (24) :6713-6718
[8]   Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants [J].
Belkadi, Aziz ;
Bolze, Alexandre ;
Itan, Yuval ;
Cobat, Aurelie ;
Vincent, Quentin B. ;
Antipenko, Alexander ;
Shang, Lei ;
Boisson, Bertrand ;
Casanova, Jean-Laurent ;
Abel, Laurent .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (17) :5473-5478
[9]   Mycobacterial Disease and Impaired IFN-γ Immunity in Humans with Inherited ISG15 Deficiency [J].
Bogunovic, Dusan ;
Byun, Minji ;
Durfee, Larissa A. ;
Abhyankar, Avinash ;
Sanal, Ozden ;
Mansouri, Davood ;
Salem, Sandra ;
Radovanovic, Irena ;
Grant, Audrey V. ;
Adimi, Parisa ;
Mansouri, Nahal ;
Okada, Satoshi ;
Bryant, Vanessa L. ;
Kong, Xiao-Fei ;
Kreins, Alexandra ;
Velez, Marcela Moncada ;
Boisson, Bertrand ;
Khalilzadeh, Soheila ;
Ozcelik, Ugur ;
Darazam, Ilad Alavi ;
Schoggins, John W. ;
Rice, Charles M. ;
Al-Muhsen, Saleh ;
Behr, Marcel ;
Vogt, Guillaume ;
Puel, Anne ;
Bustamante, Jacinta ;
Gros, Philippe ;
Huibregtse, Jon M. ;
Abel, Laurent ;
Boisson-Dupuis, Stephanie ;
Casanova, Jean-Laurent .
SCIENCE, 2012, 337 (6102) :1684-1688
[10]   Pan-cancer analysis reveals technical artifacts in TCGA germline variant calls [J].
Buckley, Alexandra R. ;
Standish, Kristopher A. ;
Bhutani, Kunal ;
Ideker, Trey ;
Lasken, Roger S. ;
Carter, Hannah ;
Harismendy, Olivier ;
Schork, Nicholas J. .
BMC GENOMICS, 2017, 18