Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm

被引:1523
作者
Cheng, Haoyu [1 ,2 ]
Concepcion, Gregory T. [3 ]
Feng, Xiaowen [1 ,2 ]
Zhang, Haowen [4 ]
Li, Heng [1 ,2 ]
机构
[1] Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
[2] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
[3] Pacific Biosci, Menlo Pk, CA USA
[4] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
基金
美国国家卫生研究院;
关键词
GENOME; ACCURATE; READS;
D O I
10.1038/s41592-020-01056-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a similar to 30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.
引用
收藏
页码:170 / +
页数:10
相关论文
共 35 条
  • [1] Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
    Berlin, Konstantin
    Koren, Sergey
    Chin, Chen-Shan
    Drake, James P.
    Landolin, Jane M.
    Phillippy, Adam M.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (06) : 623 - +
  • [2] Multi-platform discovery of haplotype-resolved structural variation in human genomes
    Chaisson, Mark J. P.
    Sanders, Ashley D.
    Zhao, Xuefang
    Malhotra, Ankit
    Porubsky, David
    Rausch, Tobias
    Gardner, Eugene J.
    Rodriguez, Oscar L.
    Guo, Li
    Collins, Ryan L.
    Fan, Xian
    Wen, Jia
    Handsaker, Robert E.
    Fairley, Susan
    Kronenberg, Zev N.
    Kong, Xiangmeng
    Hormozdiari, Fereydoun
    Lee, Dillon
    Wenger, Aaron M.
    Hastie, Alex R.
    Antaki, Danny
    Anantharaman, Thomas
    Audano, Peter A.
    Brand, Harrison
    Cantsilieris, Stuart
    Cao, Han
    Cerveira, Eliza
    Chen, Chong
    Chen, Xintong
    Chin, Chen-Shan
    Chong, Zechen
    Chuang, Nelson T.
    Lambert, Christine C.
    Church, Deanna M.
    Clarke, Laura
    Farrell, Andrew
    Flores, Joey
    Galeev, Timur
    Gorkin, David U.
    Gujral, Madhusudan
    Guryev, Victor
    Heaton, William Haynes
    Korlach, Jonas
    Kumar, Sushant
    Kwon, Jee Young
    Lam, Ernest T.
    Lee, Jong Eun
    Lee, Joyce
    Lee, Wan-Ping
    Lee, Sau Peng
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [3] Efficient assembly of nanopore reads via highly accurate and intact error correction
    Chen, Ying
    Nie, Fan
    Xie, Shang-Qian
    Zheng, Ying-Feng
    Dai, Qi
    Bray, Thomas
    Wang, Yao-Xin
    Xing, Jian-Feng
    Huang, Zhi-Jian
    Wang, De-Peng
    He, Li-Juan
    Luo, Feng
    Wang, Jian-Xin
    Liu, Yi-Zhi
    Xiao, Chuan-Le
    [J]. NATURE COMMUNICATIONS, 2021, 12 (01)
  • [4] BitMapper: an efficient all-mapper based on bit-vector computing
    Cheng, Haoyu
    Jiang, Huaipan
    Yang, Jiaoyun
    Xu, Yun
    Shang, Yi
    [J]. BMC BIOINFORMATICS, 2015, 16
  • [5] Chin C. S., 2019, HUMAN GENOME ASSEMBL, DOI DOI 10.1101/705616
  • [6] A diploid assembly-based benchmark for variants in the major histocompatibility complex
    Chin, Chen-Shan
    Wagner, Justin
    Zeng, Qiandong
    Garrison, Erik
    Garg, Shilpa
    Fungtammasan, Arkarachai
    Rautiainen, Mikko
    Aganezov, Sergey
    Kirsche, Melanie
    Zarate, Samantha
    Schatz, Michael C.
    Xiao, Chunlin
    Rowell, William J.
    Markello, Charles
    Farek, Jesse
    Sedlazeck, Fritz J.
    Bansal, Vikas
    Yoo, Byunggil
    Miller, Neil
    Zhou, Xin
    Carroll, Andrew
    Barrio, Alvaro Martinez
    Salit, Marc
    Marschall, Tobias
    Dilthey, Alexander T.
    Zook, Justin M.
    [J]. NATURE COMMUNICATIONS, 2020, 11 (01)
  • [7] Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/NMETH.4035, 10.1038/nmeth.4035]
  • [8] Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/NMETH.2474, 10.1038/nmeth.2474]
  • [9] Cleary JG, 2015, BIORXIV, DOI DOI 10.1101/023754
  • [10] HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies
    Edge, Peter
    Bafna, Vineet
    Bansal, Vikas
    [J]. GENOME RESEARCH, 2017, 27 (05) : 801 - 812