Utility of long-read sequencing for All of Us

被引:16
作者
Mahmoud, M. [1 ,2 ]
Huang, Y. [3 ]
Garimella, K. [3 ]
Audano, P. A. [4 ]
Wan, W. [3 ]
Prasad, N. [5 ]
Handsaker, R. E. [6 ,7 ]
Hall, S. [5 ]
Pionzio, A. [5 ]
Schatz, M. C. [8 ]
Talkowski, M. E. [7 ,9 ]
Eichler, E. E. [10 ,11 ]
Levy, S. E. [12 ]
Sedlazeck, F. J. [1 ,2 ,13 ]
机构
[1] Baylor Coll Med, Human Genome Sequencing Ctr, Houston, TX 77030 USA
[2] Baylor Coll Med, Dept Mol & Human Genet, Houston, TX 77030 USA
[3] Broad Inst MIT & Harvard, Data Sci Platform, Cambridge, MA 02141 USA
[4] Jackson Lab Genom Med, Farmington, CT 06032 USA
[5] Discovery Life Sci, Huntsville, AL 35806 USA
[6] Harvard Med Sch, Dept Genet, Boston, MA USA
[7] Broad Inst MIT & Harvard, Program Med & Populat Genet, Cambridge, MA 02141 USA
[8] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD USA
[9] Massachusetts Gen Hosp, Ctr Genom Med, Boston, MA USA
[10] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA USA
[11] Univ Washington, Howard Hughes Med Inst, Seattle, WA USA
[12] HudsonAlpha Inst Biotechnol, Huntsville, AL 35806 USA
[13] Rice Univ, Dept Comp Sci, Houston, TX 77005 USA
基金
美国国家卫生研究院;
关键词
MISSING HERITABILITY; DISEASES; GENOME;
D O I
10.1038/s41467-024-44804-3
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The All of Us (AoU) initiative aims to sequence the genomes of over one million Americans from diverse ethnic backgrounds to improve personalized medical care. In a recent technical pilot, we compare the performance of traditional short-read sequencing with long-read sequencing in a small cohort of samples from the HapMap project and two AoU control samples representing eight datasets. Our analysis reveals substantial differences in the ability of these technologies to accurately sequence complex medically relevant genes, particularly in terms of gene coverage and pathogenic variant identification. We also consider the advantages and challenges of using low coverage sequencing to increase sample numbers in large cohort analysis. Our results show that HiFi reads produce the most accurate results for both small and large variants. Further, we present a cloud-based pipeline to optimize SNV, indel and SV calling at scale for long-reads analysis. These results lead to widespread improvements across AoU. Using All of Us pilot data, the authors compared short- and long-read performance across medically relevant genes and showcased the utility of long reads to improve variant detection and phasing in easy and hard to resolve medically relevant genes.
引用
收藏
页数:13
相关论文
共 70 条
  • [51] A universal SNP and small-indel variant caller using deep neural networks
    Poplin, Ryan
    Chang, Pi-Chuan
    Alexander, David
    Schwartz, Scott
    Colthurst, Thomas
    Ku, Alexander
    Newburger, Dan
    Dijamco, Jojo
    Nguyen, Nam
    Afshar, Pegah T.
    Gross, Sam S.
    Dorfman, Lizzie
    McLean, Cory Y.
    DePristo, Mark A.
    [J]. NATURE BIOTECHNOLOGY, 2018, 36 (10) : 983 - +
  • [52] BEDTools: a flexible suite of utilities for comparing genomic features
    Quinlan, Aaron R.
    Hall, Ira M.
    [J]. BIOINFORMATICS, 2010, 26 (06) : 841 - 842
  • [53] Reardon S, 2015, NATURE, V525, P16, DOI 10.1038/525016a
  • [54] PacBio Sequencing and Its Applications
    Rhoads, Anthony
    Au, Kin Fai
    [J]. GENOMICS PROTEOMICS & BIOINFORMATICS, 2015, 13 (05) : 278 - 289
  • [55] Missing heritability of common diseases and treatments outside the protein-coding exome
    Sadee, Wolfgang
    Hartmann, Katherine
    Seweryn, Micha
    Pietrzak, Maciej
    Handelman, Samuel K.
    Rempala, Grzegorz A.
    [J]. HUMAN GENETICS, 2014, 133 (10) : 1199 - 1215
  • [56] Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis Visualization, and Informatics Lab-space
    Schatz, Michael C.
    Philippakis, Anthony A.
    Afgan, Enis
    Banks, Eric
    Carey, Vincent J.
    Carroll, Robert J.
    Culotti, Alessandro
    Ellrott, Kyle
    Goecks, Jeremy
    Grossman, Robert L.
    Hall, Ira M.
    Hansen, Kasper D.
    Lawson, Jonathan
    Leek, Jeffrey T.
    Luria, Anne O'Donnell
    Mosher, Stephen
    Morgan, Martin
    Nekrutenko, Anton
    O'Connor, Brian D.
    Osborn, Kevin
    Paten, Benedict
    Patterson, Candace
    Tan, Frederick J.
    Taylor, Casey Overby
    Vessio, Jennifer
    Waldron, Levi
    Wang, Ting
    Wuichet, Kristin
    [J]. CELL GENOMICS, 2022, 2 (01):
  • [57] Accurate detection of complex structural variations using single-molecule sequencing
    Sedlazeck, Fritz J.
    Rescheneder, Philipp
    Smolka, Moritz
    Fang, Han
    Nattestad, Maria
    von Haeseler, Arndt
    Schatz, Michael C.
    [J]. NATURE METHODS, 2018, 15 (06) : 461 - +
  • [58] Piercing the dark matter: bioinformatics of long-range sequencing and mapping
    Sedlazeck, Fritz J.
    Lee, Hayan
    Darby, Charlotte A.
    Schatz, Michael C.
    [J]. NATURE REVIEWS GENETICS, 2018, 19 (06) : 329 - 346
  • [59] Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads
    Shafin, Kishwar
    Pesout, Trevor
    Chang, Pi-Chuan
    Nattestad, Maria
    Kolesnikov, Alexey
    Goel, Sidharth
    Baid, Gunjan
    Kolmogorov, Mikhail
    Eizenga, Jordan M.
    Miga, Karen H.
    Carnevali, Paolo
    Jain, Miten
    Carroll, Andrew
    Paten, Benedict
    [J]. NATURE METHODS, 2021, 18 (11) : 1322 - +
  • [60] Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes
    Shafin, Kishwar
    Pesout, Trevor
    Lorig-Roach, Ryan
    Haukness, Marina
    Olsen, Hugh E.
    Bosworth, Colleen
    Armstrong, Joel
    Tigyi, Kristof
    Maurer, Nicholas
    Koren, Sergey
    Sedlazeck, Fritz J.
    Marschall, Tobias
    Mayes, Simon
    Costa, Vania
    Zook, Justin M.
    Liu, Kelvin J.
    Kilburn, Duncan
    Sorensen, Melanie
    Munson, Katy M.
    Vollger, Mitchell R.
    Monlong, Jean
    Garrison, Erik
    Eichler, Evan E.
    Salama, Sofie
    Haussler, David
    Green, Richard E.
    Akeson, Mark
    Phillippy, Adam
    Miga, Karen H.
    Carnevali, Paolo
    Jain, Miten
    Paten, Benedict
    [J]. NATURE BIOTECHNOLOGY, 2020, 38 (09) : 1044 - +