Long-read sequence and assembly of segmental duplications

被引:0
|
作者
Mitchell R. Vollger
Philip C. Dishuck
Melanie Sorensen
AnneMarie E. Welch
Vy Dang
Max L. Dougherty
Tina A. Graves-Lindsay
Richard K. Wilson
Mark J. P. Chaisson
Evan E. Eichler
机构
[1] University of Washington School of Medicine,Department of Genome Sciences
[2] The McDonnell Genome Institute at Washington University,Institute for Genomic Medicine
[3] Washington University School of Medicine,Department of Pediatrics
[4] Nationwide Children’s Hospital,Howard Hughes Medical Institute
[5] The Ohio State University College of Medicine,undefined
[6] University of Southern California,undefined
[7] University of Washington,undefined
来源
Nature Methods | 2019年 / 16卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33–79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (<99.8%) compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.
引用
收藏
页码:88 / 94
页数:6
相关论文
共 50 条
  • [1] Long-read sequence and assembly of segmental duplications
    Vollger, Mitchell R.
    Dishuck, Philip C.
    Sorensen, Melanie
    Welch, AnneMarie E.
    Dang, Vy
    Dougherty, Max L.
    Graves-Lindsay, Tina A.
    Wilson, Richard K.
    Chaisson, Mark J. P.
    Eichler, Evan E.
    NATURE METHODS, 2019, 16 (01) : 88 - +
  • [2] Long-read sequence assembly of the gorilla genome
    Gordon, David
    Huddleston, John
    Chaisson, Mark J. P.
    Hill, Christopher M.
    Kronenberg, Zev N.
    Munson, Katherine M.
    Malig, Maika
    Raja, Archana
    Fiddes, Ian
    Hillier, LaDeana W.
    Dunn, Christopher
    Baker, Carl
    Armstrong, Joel
    Diekhans, Mark
    Paten, Benedict
    Shendure, Jay
    Wilson, Richard K.
    Haussler, David
    Chin, Chen-Shan
    Eichler, Evan E.
    SCIENCE, 2016, 352 (6281)
  • [3] Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications
    Prodanov, Timofey
    Bansal, Vikas
    NUCLEIC ACIDS RESEARCH, 2020, 48 (19) : E114
  • [4] Long-read sequence assembly: a technical evaluation in barley
    Mascher, Martin
    Wicker, Thomas
    Jenkins, Jerry
    Plott, Christopher
    Lux, Thomas
    Koh, Chu Shin
    Ens, Jennifer
    Gundlach, Heidrun
    Boston, Lori B.
    Tulpova, Zuzana
    Holden, Samuel
    Hernandez-Pinzon, Inmaculada
    Scholz, Uwe
    Mayer, Klaus F. X.
    Spannagl, Manuel
    Pozniak, Curtis J.
    Sharpe, Andrew G.
    Simkova, Hana
    Moscou, Matthew J.
    Grimwood, Jane
    Schmutz, Jeremy
    Stein, Nils
    PLANT CELL, 2021, 33 (06): : 1888 - 1906
  • [5] Long-read sequence assembly of the firefly Pyrocoelia pectoralis genome
    Fu, Xinhua
    Li, Jingjing
    Tian, Yu
    Quan, Weipeng
    Zhang, Shu
    Liu, Qian
    Liang, Fan
    Zhu, Xinlei
    Zhang, Liangsheng
    Wang, Depeng
    Hu, Jiang
    GIGASCIENCE, 2017, 6 (12): : 1 - 7
  • [6] Long road to long-read assembly
    Marx, Vivien
    NATURE METHODS, 2021, 18 (02) : 125 - 129
  • [7] Long road to long-read assembly
    Vivien Marx
    Nature Methods, 2021, 18 : 125 - 129
  • [8] Democratizing long-read genome assembly
    Kirsche, Melanie
    Schatz, Michael C.
    CELL SYSTEMS, 2021, 12 (10) : 945 - 947
  • [9] Long-read genotyping with SLANG (Simple Long-read loci Assembly of Nanopore data for Genotyping)
    Dorfner, Marco
    Ott, Tankred
    Ott, Philipp
    Oberprieler, Christoph
    APPLICATIONS IN PLANT SCIENCES, 2022, 10 (03):
  • [10] Comparison and benchmark of structural variants detected from long read and long-read assembly
    Lin, Jiadong
    Jia, Peng
    Wang, Songbo
    Kosters, Walter
    Ye, Kai
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (04)