Human whole-exome genotype data for Alzheimer's disease

被引:0
作者
Leung, Yuk Yee [1 ]
Naj, Adam C. [1 ,2 ]
Chou, Yi-Fan [1 ]
Valladares, Otto [1 ]
Schmidt, Michael [3 ,4 ]
Hamilton-Nelson, Kara [3 ,4 ]
Wheeler, Nicholas [5 ,6 ]
Lin, Honghuang [7 ]
Gangadharan, Prabhakaran [1 ]
Qu, Liming [1 ]
Clark, Kaylyn [1 ]
Kuzma, Amanda B. [1 ]
Lee, Wan-Ping [1 ]
Cantwell, Laura [1 ]
Nicaretta, Heather [1 ]
van der Lee, Sven [18 ]
English, Adam [19 ]
Kalra, Divya [19 ]
Muzny, Donna [19 ]
Skinner, Evette [19 ]
Doddapeneni, Harsha [19 ]
Dinh, Huyen [19 ]
Hu, Jianhong [19 ]
Santibanez, Jireh [19 ]
Jayaseelan, Joy [19 ]
Worley, Kim [19 ]
Gibbs, Richard A. [19 ]
Lee, Sandra [19 ]
Dugan-Perez, Shannon [19 ]
Korchina, Viktoriya [19 ]
Nasser, Waleed [19 ]
Liu, Xiuping [19 ]
Han, Yi [19 ]
Zhu, Yiming [19 ]
Liu, Yue [19 ]
Khan, Ziad [19 ]
Zhu, Congcong [10 ]
Sun, Fangui Jenny [10 ]
Jun, Gyungah R. [10 ]
Chung, Jaeyoon [10 ]
Farrell, John [10 ]
Zhang, Xiaoling [10 ]
Banks, Eric [20 ]
Gupta, Namrata [20 ]
Gabriel, Stacey [20 ]
Butkiewicz, Mariusz [5 ,6 ]
Benchek, Penelope [5 ,6 ]
Smieszek, Sandra [5 ,6 ]
Song, Yeunjoo [5 ,6 ]
Vardarajan, Badri [14 ,15 ,16 ]
机构
[1] Univ Penn, Penn Neurodegenerat Genom Ctr, Perelman Sch Med, Dept Pathol & Lab Med, Philadelphia, PA 19104 USA
[2] Univ Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Philadelphia, PA USA
[3] Univ Miami, Miller Sch Med, Dr John T Macdonald Fdn Dept Human Genet, Miami, FL 33136 USA
[4] Univ Miami, John P Hussman Inst Human Genom, Miami, FL USA
[5] Case Western Reserve Univ, Dept Populat & Quantitat Hlth Sci, Cleveland Hts, OH USA
[6] Case Western Reserve Univ, Sch Med, Dept Genet & Genome Sci, Cleveland, OH 44106 USA
[7] UMass Chan Med Sch, Dept Med, Worcester, MA USA
[8] Boston Univ, Chobanian & Avedisian Sch Med, Dept Med Biomed Genet, Boston, MA 02215 USA
[9] Boston Univ, Sch Publ Hlth, Dept Biostat, Boston, MA USA
[10] Boston Univ, Sch Med, Boston, MA USA
[11] Univ Texas Hlth Sci Ctr San Antonio, Glenn Biggs Inst Alzheimers & Neurodegenerat Dis, San Antonio, TX 78229 USA
[12] Univ Washington, Dept Psychiat & Behav Sci, Seattle, WA USA
[13] Washington Univ, Sch Med, St Louis, MO USA
[14] Taub Inst Res Alzheimers Dis & Aging Brain, Dept Neurol, 630W 168th St, New York, NY 10032 USA
[15] Columbia Univ, Gertrude H Sergievsky Ctr, New York, NY USA
[16] New York Presbyterian Hosp, New York, NY USA
[17] Boston Univ, Sch Med, Dept Neurol, Boston, MA USA
[18] Amsterdam UMC, Amsterdam, Netherlands
[19] Baylor Coll Med, Houston, TX USA
[20] Broad Inst Harvard, Cambridge, MA USA
[21] Erasmus Univ, Rotterdam, Netherlands
[22] Indiana Univ, Ft Wayne, IN USA
[23] Johnson & Johnson, Titusville, FL USA
[24] Med Univ Graz, Graz, Austria
[25] MITRE, Mclean, VA USA
[26] Mt Sinai Sch Med New York, New York, NY USA
[27] Natl Ctr Biotechnol Informat, Bethesda, MD 20894 USA
[28] Natl Inst Aging, Bethesda, MD USA
[29] Nationwide Childrens, Columbus, OH USA
[30] Univ Oxford, Oxford, Oxfordshire, England
[31] Regeneron, Tarrytown, NY USA
[32] Rush Univ, Chicago, IL USA
[33] Stanford Univ, Stanford, CA USA
[34] Univ Colorado, Boulder, CO 80309 USA
[35] Univ Mississippi, Oxford, MS USA
[36] Univ Washington, Seattle, WA USA
[37] Vanderbilt Univ, Nashville, TN USA
关键词
GENOME; FRAMEWORK;
D O I
10.1038/s41467-024-44781-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The heterogeneity of the whole-exome sequencing (WES) data generation methods present a challenge to a joint analysis. Here we present a bioinformatics strategy for joint-calling 20,504 WES samples collected across nine studies and sequenced using ten capture kits in fourteen sequencing centers in the Alzheimer's Disease Sequencing Project. The joint-genotype called variant-called format (VCF) file contains only positions within the union of capture kits. The VCF was then processed specifically to account for the batch effects arising from the use of different capture kits from different studies. We identified 8.2 million autosomal variants. 96.82% of the variants are high-quality, and are located in 28,579 Ensembl transcripts. 41% of the variants are intronic and 1.8% of the variants are with CADD > 30, indicating they are of high predicted pathogenicity. Here we show our new strategy can generate high-quality data from processing these diversely generated WES samples. The improved ability to combine data sequenced in different batches benefits the whole genomics research community.
引用
收藏
页数:15
相关论文
共 31 条
  • [1] The Ensembl gene annotation system
    Aken, Bronwen L.
    Ayling, Sarah
    Barrell, Daniel
    Clarke, Laura
    Curwen, Valery
    Fairley, Susan
    Banet, Julio Fernandez
    Billis, Konstantinos
    Giron, Carlos Garcia
    Hourlier, Thibaut
    Howe, Kevin
    Kahari, Andreas
    Kokocinski, Felix
    Martin, Fergal J.
    Murphy, Daniel N.
    Nag, Rishi
    Ruffier, Magali
    Schuster, Michael
    Tang, Y. Amy
    Vogel, Jan-Hinnerk
    White, Simon
    Zadissa, Amonida
    Flicek, Paul
    Searle, Stephen M. J.
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
  • [2] THE ALZHEIMER'S DISEASE SEQUENCING PROJECT: STUDY DESIGN AND SAMPLE SELECTION
    Beecham, Gary W.
    Bis, J. C.
    Martin, E. R.
    Choi, S. -H.
    DeStefano, A. L.
    van Duijn, C. M.
    Fornage, M.
    Gabriel, S. B.
    Koboldt, D. C.
    Larson, D. E.
    Naj, A. C.
    Psaty, B. M.
    Salerno, W.
    Bush, W. S.
    Foroud, T. M.
    Wijsman, E.
    Farrer, L. A.
    Goate, A.
    Haines, J. L.
    Pericak-Vance, Margaret A.
    Boerwinkle, E.
    Mayeux, R.
    Seshadri, S.
    Schellenberg, G.
    [J]. NEUROLOGY-GENETICS, 2017, 3 (05)
  • [3] Bis JC, 2020, MOL PSYCHIATR, V25, P1859, DOI 10.1038/s41380-018-0112-7
  • [4] Functional annotation of genomic variants in studies of late-onset Alzheimer's disease
    Butkiewicz, Mariusz
    Blue, Elizabeth E.
    Leung, Yuk Yee
    Jian, Xueqiu
    Marcora, Edoardo
    Renton, Alan E.
    Kuzma, Amanda
    Wang, Li-San
    Koboldt, Daniel C.
    Haines, Jonathan L.
    Bush, William S.
    [J]. BIOINFORMATICS, 2018, 34 (16) : 2724 - 2731
  • [5] An integrative variant analysis suite for whole exome next-generation sequencing data
    Challis, Danny
    Yu, Jin
    Evani, Uday S.
    Jackson, Andrew R.
    Paithankar, Sameer
    Coarfa, Cristian
    Milosavljevic, Aleksandar
    Gibbs, Richard A.
    Yu, Fuli
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [6] A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3
    Cingolani, Pablo
    Platts, Adrian
    Wang, Le Lily
    Coon, Melissa
    Tung Nguyen
    Wang, Luan
    Land, Susan J.
    Lu, Xiangyi
    Ruden, Douglas M.
    [J]. FLY, 2012, 6 (02) : 80 - 92
  • [7] Performance comparison of exome DNA sequencing technologies
    Clark, Michael J.
    Chen, Rui
    Lam, Hugo Y. K.
    Karczewski, Konrad J.
    Chen, Rong
    Euskirchen, Ghia
    Butte, Atul J.
    Snyder, Michael
    [J]. NATURE BIOTECHNOLOGY, 2011, 29 (10) : 908 - U206
  • [8] A framework for variation discovery and genotyping using next-generation DNA sequencing data
    DePristo, Mark A.
    Banks, Eric
    Poplin, Ryan
    Garimella, Kiran V.
    Maguire, Jared R.
    Hartl, Christopher
    Philippakis, Anthony A.
    del Angel, Guillermo
    Rivas, Manuel A.
    Hanna, Matt
    McKenna, Aaron
    Fennell, Tim J.
    Kernytsky, Andrew M.
    Sivachenko, Andrey Y.
    Cibulskis, Kristian
    Gabriel, Stacey B.
    Altshuler, David
    Daly, Mark J.
    [J]. NATURE GENETICS, 2011, 43 (05) : 491 - +
  • [9] Exome sequencing identifies rare damaging variants in ATP8B4 and ABCA1 as risk factors for Alzheimer's disease
    Holstege, Henne
    Hulsman, Marc
    Charbonnier, Camille
    Grenier-Boley, Benjamin
    Quenez, Olivier
    Grozeva, Detelina
    van Rooij, Jeroen G. J.
    Sims, Rebecca
    Ahmad, Shahzad
    Amin, Najaf
    Norsworthy, Penny J.
    Dols-Icardo, Oriol
    Hummerich, Holger
    Kawalia, Amit
    Amouyel, Philippe
    Beecham, Gary W.
    Berr, Claudine
    Bis, Joshua C.
    Boland, Anne
    Bossu, Paola
    Bouwman, Femke
    Bras, Jose
    Campion, Dominique
    Cochran, J. Nicholas
    Daniele, Antonio
    Dartigues, Jean-Francois
    Debette, Stephanie
    Deleuze, Jean-Francois
    Denning, Nicola
    DeStefano, Anita L.
    Farrer, Lindsay A.
    Fernandez, Maria Victoria
    Fox, Nick C.
    Galimberti, Daniela
    Genin, Emmanuelle
    Gille, Johan J. P.
    Le Guen, Yann
    Guerreiro, Rita
    Haines, Jonathan L.
    Holmes, Clive
    Ikram, M. Arfan
    Ikram, M. Kamran
    Jansen, Iris E.
    Kraaij, Robert
    Lathrop, Marc
    Lemstra, Afina W.
    Lleo, Alberto
    Luckcuck, Lauren
    Mannens, Marcel M. A. M.
    Marshall, Rachel
    [J]. NATURE GENETICS, 2022, 54 (12) : 1786 - 1794
  • [10] NIA-AA Research Framework: Toward a biological definition of Alzheimer's disease
    Jack, Clifford R., Jr.
    Bennett, David A.
    Blennow, Kaj
    Carrillo, Maria C.
    Dunn, Billy
    Haeberlein, Samantha Budd
    Holtzman, David M.
    Jagust, William
    Jessen, Frank
    Karlawish, Jason
    Liu, Enchi
    Luis Molinuevo, Jose
    Montine, Thomas
    Phelps, Creighton
    Rankin, Katherine P.
    Rowe, Christopher C.
    Scheltens, Philip
    Siemers, Eric
    Snyder, Heather M.
    Sperling, Reisa
    Elliott, Cerise
    Masliah, Eliezer
    Ryan, Laurie
    Silverberg, Nina
    [J]. ALZHEIMERS & DEMENTIA, 2018, 14 (04) : 535 - 562