Long-read sequence and assembly of segmental duplications

被引:0
|
作者
Mitchell R. Vollger
Philip C. Dishuck
Melanie Sorensen
AnneMarie E. Welch
Vy Dang
Max L. Dougherty
Tina A. Graves-Lindsay
Richard K. Wilson
Mark J. P. Chaisson
Evan E. Eichler
机构
[1] University of Washington School of Medicine,Department of Genome Sciences
[2] The McDonnell Genome Institute at Washington University,Institute for Genomic Medicine
[3] Washington University School of Medicine,Department of Pediatrics
[4] Nationwide Children’s Hospital,Howard Hughes Medical Institute
[5] The Ohio State University College of Medicine,undefined
[6] University of Southern California,undefined
[7] University of Washington,undefined
来源
Nature Methods | 2019年 / 16卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33–79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (<99.8%) compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.
引用
收藏
页码:88 / 94
页数:6
相关论文
共 50 条
  • [21] BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper
    Guidi, Giulia
    Ellis, Marquita
    Rokhsar, Daniel
    Yelick, Katherine
    Buluc, Aydin
    PROCEEDINGS OF THE 2021 SIAM CONFERENCE ON APPLIED AND COMPUTATIONAL DISCRETE ALGORITHMS, ACDA21, 2021, : 123 - 134
  • [22] yacrd and fpa: upstream tools for long-read genome assembly
    Marijon, Pierre
    Chikhi, Rayan
    Varre, Jean-Stephane
    BIOINFORMATICS, 2020, 36 (12) : 3894 - 3896
  • [23] HINGE: long-read assembly achieves optimal repeat resolution
    Kamath, Govinda M.
    Shomorony, Ilan
    Xia, Fei
    Courtade, Thomas A.
    Tse, David N.
    GENOME RESEARCH, 2017, 27 (05) : 747 - 756
  • [24] Gapless assembly of maize chromosomes using long-read technologies
    Liu, Jianing
    Seetharam, Arun S.
    Chougule, Kapeel
    Ou, Shujun
    Swentowsky, Kyle W.
    Gent, Jonathan, I
    Llaca, Victor
    Woodhouse, Margaret R.
    Manchanda, Nancy
    Presting, Gernot G.
    Kudrna, David A.
    Alabady, Magdy
    Hirsch, Candice N.
    Fengler, Kevin A.
    Ware, Doreen
    Michael, Todd P.
    Hufford, Matthew B.
    Dawe, R. Kelly
    GENOME BIOLOGY, 2020, 21 (01)
  • [25] Accurate long-read de novo assembly evaluation with Inspector
    Chen, Yu
    Zhang, Yixin
    Wang, Amy Y.
    Gao, Min
    Chong, Zechen
    GENOME BIOLOGY, 2021, 22 (01)
  • [26] Comparison of long-read methods for sequencing and assembly of a plant genome
    Murigneux, Valentine
    Rai, Subash Kumar
    Furtado, Agnelo
    Bruxner, Timothy J. C.
    Tian, Wei
    Harliwong, Ivon
    Wei, Hanmin
    Yang, Bicheng
    Ye, Qianyu
    Anderson, Ellis
    Mao, Qing
    Drmanac, Radoje
    Wang, Ou
    Peters, Brock A.
    Xu, Mengyang
    Wu, Pei
    Topp, Bruce
    Coin, Lachlan J. M.
    Henry, Robert J.
    GIGASCIENCE, 2020, 9 (12):
  • [27] Fast and accurate long-read assembly with wtdbg2
    Jue Ruan
    Heng Li
    Nature Methods, 2020, 17 : 155 - 158
  • [28] Snakemake workflows for long-read bacterial genome assembly and evaluation
    Menzel, Peter
    GIGABYTE, 2024, 2024 : 1 - 7
  • [29] Long-read sequencing and de novo assembly of a Chinese genome
    Lingling Shi
    Yunfei Guo
    Chengliang Dong
    John Huddleston
    Hui Yang
    Xiaolu Han
    Aisi Fu
    Quan Li
    Na Li
    Siyi Gong
    Katherine E. Lintner
    Qiong Ding
    Zou Wang
    Jiang Hu
    Depeng Wang
    Feng Wang
    Lin Wang
    Gholson J. Lyon
    Yongtao Guan
    Yufeng Shen
    Oleg V. Evgrafov
    James A. Knowles
    Francoise Thibaud-Nissen
    Valerie Schneider
    Chack-Yung Yu
    Libing Zhou
    Evan E. Eichler
    Kwok-Fai So
    Kai Wang
    Nature Communications, 7
  • [30] Fast and accurate long-read assembly with wtdbg2
    Ruan, Jue
    Li, Heng
    NATURE METHODS, 2020, 17 (02) : 155 - +