Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm

被引:1935
作者
Cheng, Haoyu [1 ,2 ]
Concepcion, Gregory T. [3 ]
Feng, Xiaowen [1 ,2 ]
Zhang, Haowen [4 ]
Li, Heng [1 ,2 ]
机构
[1] Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
[2] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
[3] Pacific Biosci, Menlo Pk, CA USA
[4] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
基金
美国国家卫生研究院;
关键词
GENOME; ACCURATE; READS;
D O I
10.1038/s41592-020-01056-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a similar to 30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.
引用
收藏
页码:170 / +
页数:10
相关论文
共 35 条
[21]   A synthetic-diploid benchmark for accurate variant-calling evaluation [J].
Li, Heng ;
Bloom, Jonathan M. ;
Farjoun, Yossi ;
Fleharty, Mark ;
Gauthier, Laura ;
Neale, Benjamin ;
MacArthur, Daniel .
NATURE METHODS, 2018, 15 (08) :595-+
[22]   Minimap2: pairwise alignment for nucleotide sequences [J].
Li, Heng .
BIOINFORMATICS, 2018, 34 (18) :3094-3100
[23]   Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences [J].
Li, Heng .
BIOINFORMATICS, 2016, 32 (14) :2103-2110
[24]   Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly [J].
Li, Heng .
BIOINFORMATICS, 2012, 28 (14) :1838-1844
[25]  
Martin M., 2016, WHATSHAP FAST ACCURA, DOI [10.1101/085050, DOI 10.1101/085050]
[26]   Aggressive assembly of pyrosequencing reads with mates [J].
Miller, Jason R. ;
Delcher, Arthur L. ;
Koren, Sergey ;
Venter, Eli ;
Walenz, Brian P. ;
Brownley, Anushka ;
Johnson, Justin ;
Li, Kelvin ;
Mobarry, Clark ;
Sutton, Granger .
BIOINFORMATICS, 2008, 24 (24) :2818-2824
[27]   The fragment assembly string graph [J].
Myers, EW .
BIOINFORMATICS, 2005, 21 :79-85
[28]   A fast bit-vector algorithm for approximate string matching based on dynamic programming [J].
Myers, G .
JOURNAL OF THE ACM, 1999, 46 (03) :395-415
[29]   HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads [J].
Nurk, Sergey ;
Walenz, Brian P. ;
Rhie, Arang ;
Vollger, Mitchell R. ;
Logsdon, Glennis A. ;
Grothe, Robert ;
Miga, Karen H. ;
Eichler, Evan E. ;
Phillippy, Adam M. ;
Koren, Sergey .
GENOME RESEARCH, 2020, 30 (09) :1291-1305
[30]   Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads [J].
Porubsky, David ;
Ebert, Peter ;
Audano, Peter A. ;
Vollger, Mitchell R. ;
Harvey, William T. ;
Marijon, Pierre ;
Ebler, Jana ;
Munson, Katherine M. ;
Sorensen, Melanie ;
Sulovari, Arvis ;
Haukness, Marina ;
Ghareghani, Maryam ;
Lansdorp, Peter M. ;
Paten, Benedict ;
Devine, Scott E. ;
Sanders, Ashley D. ;
Lee, Charles ;
Chaisson, Mark J. P. ;
Korbel, Jan O. ;
Eichler, Evan E. ;
Marschall, Tobias .
NATURE BIOTECHNOLOGY, 2021, 39 (03) :302-308