Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm

被引:1827
作者
Cheng, Haoyu [1 ,2 ]
Concepcion, Gregory T. [3 ]
Feng, Xiaowen [1 ,2 ]
Zhang, Haowen [4 ]
Li, Heng [1 ,2 ]
机构
[1] Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
[2] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
[3] Pacific Biosci, Menlo Pk, CA USA
[4] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
基金
美国国家卫生研究院;
关键词
GENOME; ACCURATE; READS;
D O I
10.1038/s41592-020-01056-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a similar to 30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.
引用
收藏
页码:170 / +
页数:10
相关论文
共 35 条
[1]   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing [J].
Berlin, Konstantin ;
Koren, Sergey ;
Chin, Chen-Shan ;
Drake, James P. ;
Landolin, Jane M. ;
Phillippy, Adam M. .
NATURE BIOTECHNOLOGY, 2015, 33 (06) :623-+
[2]   Multi-platform discovery of haplotype-resolved structural variation in human genomes [J].
Chaisson, Mark J. P. ;
Sanders, Ashley D. ;
Zhao, Xuefang ;
Malhotra, Ankit ;
Porubsky, David ;
Rausch, Tobias ;
Gardner, Eugene J. ;
Rodriguez, Oscar L. ;
Guo, Li ;
Collins, Ryan L. ;
Fan, Xian ;
Wen, Jia ;
Handsaker, Robert E. ;
Fairley, Susan ;
Kronenberg, Zev N. ;
Kong, Xiangmeng ;
Hormozdiari, Fereydoun ;
Lee, Dillon ;
Wenger, Aaron M. ;
Hastie, Alex R. ;
Antaki, Danny ;
Anantharaman, Thomas ;
Audano, Peter A. ;
Brand, Harrison ;
Cantsilieris, Stuart ;
Cao, Han ;
Cerveira, Eliza ;
Chen, Chong ;
Chen, Xintong ;
Chin, Chen-Shan ;
Chong, Zechen ;
Chuang, Nelson T. ;
Lambert, Christine C. ;
Church, Deanna M. ;
Clarke, Laura ;
Farrell, Andrew ;
Flores, Joey ;
Galeev, Timur ;
Gorkin, David U. ;
Gujral, Madhusudan ;
Guryev, Victor ;
Heaton, William Haynes ;
Korlach, Jonas ;
Kumar, Sushant ;
Kwon, Jee Young ;
Lam, Ernest T. ;
Lee, Jong Eun ;
Lee, Joyce ;
Lee, Wan-Ping ;
Lee, Sau Peng .
NATURE COMMUNICATIONS, 2019, 10 (1)
[3]   Efficient assembly of nanopore reads via highly accurate and intact error correction [J].
Chen, Ying ;
Nie, Fan ;
Xie, Shang-Qian ;
Zheng, Ying-Feng ;
Dai, Qi ;
Bray, Thomas ;
Wang, Yao-Xin ;
Xing, Jian-Feng ;
Huang, Zhi-Jian ;
Wang, De-Peng ;
He, Li-Juan ;
Luo, Feng ;
Wang, Jian-Xin ;
Liu, Yi-Zhi ;
Xiao, Chuan-Le .
NATURE COMMUNICATIONS, 2021, 12 (01)
[4]   BitMapper: an efficient all-mapper based on bit-vector computing [J].
Cheng, Haoyu ;
Jiang, Huaipan ;
Yang, Jiaoyun ;
Xu, Yun ;
Shang, Yi .
BMC BIOINFORMATICS, 2015, 16
[5]  
Chin C. S., 2019, HUMAN GENOME ASSEMBL, DOI DOI 10.1101/705616
[6]   A diploid assembly-based benchmark for variants in the major histocompatibility complex [J].
Chin, Chen-Shan ;
Wagner, Justin ;
Zeng, Qiandong ;
Garrison, Erik ;
Garg, Shilpa ;
Fungtammasan, Arkarachai ;
Rautiainen, Mikko ;
Aganezov, Sergey ;
Kirsche, Melanie ;
Zarate, Samantha ;
Schatz, Michael C. ;
Xiao, Chunlin ;
Rowell, William J. ;
Markello, Charles ;
Farek, Jesse ;
Sedlazeck, Fritz J. ;
Bansal, Vikas ;
Yoo, Byunggil ;
Miller, Neil ;
Zhou, Xin ;
Carroll, Andrew ;
Barrio, Alvaro Martinez ;
Salit, Marc ;
Marschall, Tobias ;
Dilthey, Alexander T. ;
Zook, Justin M. .
NATURE COMMUNICATIONS, 2020, 11 (01)
[7]  
Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/NMETH.4035, 10.1038/nmeth.4035]
[8]  
Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/NMETH.2474, 10.1038/nmeth.2474]
[9]  
Cleary JG, 2015, BIORXIV, DOI DOI 10.1101/023754
[10]   HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies [J].
Edge, Peter ;
Bafna, Vineet ;
Bansal, Vikas .
GENOME RESEARCH, 2017, 27 (05) :801-812