Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations

被引:7
|
作者
Cosma, Bianca-Maria [1 ]
Zade, Ramin Shirali Hossein [1 ]
Jordan, Erin Noel [1 ,2 ]
van Lent, Paul [1 ]
Peng, Chengyao [1 ]
Pillay, Stephanie [1 ]
Abeel, Thomas [1 ,3 ]
机构
[1] Delft Univ Technol, Delft Bioinformat Lab, Intelligent Syst, NL-2628 XE Delft, Netherlands
[2] TU Dortmund Univ, Tech Biochem, D-44227 Dortmund, Germany
[3] Broad Inst MIT & Harvard, Infect Dis & Microbiome Program, Cambridge, MA 02142 USA
来源
GIGASCIENCE | 2023年 / 12卷
关键词
de novo assembly; third-generation sequencing; benchmarking; eukaryote genomes;
D O I
10.1093/gigascience/giad100
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background Assembly algorithm choice should be a deliberate, well-justified decision when researchers create genome assemblies for eukaryotic organisms from third-generation sequencing technologies. While third-generation sequencing by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) has overcome the disadvantages of short read lengths specific to next-generation sequencing (NGS), third-generation sequencers are known to produce more error-prone reads, thereby generating a new set of challenges for assembly algorithms and pipelines. However, the introduction of HiFi reads, which offer substantially reduced error rates, has provided a promising solution for more accurate assembly outcomes. Since the introduction of third-generation sequencing technologies, many tools have been developed that aim to take advantage of the longer reads, and researchers need to choose the correct assembler for their projects.Results We benchmarked state-of-the-art long-read de novo assemblers to help readers make a balanced choice for the assembly of eukaryotes. To this end, we used 12 real and 64 simulated datasets from different eukaryotic genomes, with different read length distributions, imitating PacBio continuous long-read (CLR), PacBio high-fidelity (HiFi), and ONT sequencing to evaluate the assemblers. We include 5 commonly used long-read assemblers in our benchmark: Canu, Flye, Miniasm, Raven, and wtdbg2 for ONT and PacBio CLR reads. For PacBio HiFi reads , we include 5 state-of-the-art HiFi assemblers: HiCanu, Flye, Hifiasm, LJA, and MBG. Evaluation categories address the following metrics: reference-based metrics, assembly statistics, misassembly count, BUSCO completeness, runtime, and RAM usage. Additionally, we investigated the effect of increased read length on the quality of the assemblies and report that read length can, but does not always, positively impact assembly quality.Conclusions Our benchmark concludes that there is no assembler that performs the best in all the evaluation categories. However, our results show that overall Flye is the best-performing assembler for PacBio CLR and ONT reads, both on real and simulated data. Meanwhile, best-performing PacBio HiFi assemblers are Hifiasm and LJA. Next, the benchmarking using longer reads shows that the increased read length improves assembly quality, but the extent to which that can be achieved depends on the size and complexity of the reference genome.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Tools and Strategies for Long-Read Sequencing and De Novo Assembly of Plant Genomes
    Jung, Hyungtaek
    Winefield, Christopher
    Bombarely, Aureliano
    Prentis, Peter
    Waterhouse, Peter
    TRENDS IN PLANT SCIENCE, 2019, 24 (08) : 700 - 724
  • [2] Long-read Sequencing and de novo Genome Assembly of Three Aspergillus fumigatus Genomes
    Samuel J. Hemmings
    Johanna L. Rhodes
    Matthew C. Fisher
    Mycopathologia, 2023, 188 : 409 - 412
  • [3] Long-read Sequencing and de novo Genome Assembly of Three Aspergillus fumigatus Genomes
    Hemmings, Samuel J.
    Rhodes, Johanna L.
    Fisher, Matthew C.
    MYCOPATHOLOGIA, 2023, 188 (04) : 409 - 412
  • [4] Accurate long-read de novo assembly evaluation with Inspector
    Chen, Yu
    Zhang, Yixin
    Wang, Amy Y.
    Gao, Min
    Chong, Zechen
    GENOME BIOLOGY, 2021, 22 (01)
  • [5] Long-read sequencing and de novo assembly of a Chinese genome
    Shi, Lingling
    Guo, Yunfei
    Dong, Chengliang
    Huddleston, John
    Yang, Hui
    Han, Xiaolu
    Fu, Aisi
    Li, Quan
    Li, Na
    Gong, Siyi
    Lintner, Katherine E.
    Ding, Qiong
    Wang, Zou
    Hu, Jiang
    Wang, Depeng
    Wang, Feng
    Wang, Lin
    Lyon, Gholson J.
    Guan, Yongtao
    Shen, Yufeng
    Evgrafov, Oleg V.
    Knowles, James A.
    Thibaud-Nissen, Francoise
    Schneider, Valerie
    Yu, Chack-Yung
    Zhou, Libing
    Eichler, Evan E.
    So, Kwok-Fai
    Wang, Kai
    NATURE COMMUNICATIONS, 2016, 7
  • [6] Long-read sequencing and de novo assembly of a Chinese genome
    Lingling Shi
    Yunfei Guo
    Chengliang Dong
    John Huddleston
    Hui Yang
    Xiaolu Han
    Aisi Fu
    Quan Li
    Na Li
    Siyi Gong
    Katherine E. Lintner
    Qiong Ding
    Zou Wang
    Jiang Hu
    Depeng Wang
    Feng Wang
    Lin Wang
    Gholson J. Lyon
    Yongtao Guan
    Yufeng Shen
    Oleg V. Evgrafov
    James A. Knowles
    Francoise Thibaud-Nissen
    Valerie Schneider
    Chack-Yung Yu
    Libing Zhou
    Evan E. Eichler
    Kwok-Fai So
    Kai Wang
    Nature Communications, 7
  • [7] Long-read sequencing and de novo assembly of the cynomolgus macaque genome
    Bai, Bing
    Wang, Yi
    Zhu, Ran
    Zhang, Yaolei
    Wang, Hong
    Fan, Guangyi
    Liu, Xin
    Shi, Hong
    Niu, Yuyu
    Ji, Weizhi
    JOURNAL OF GENETICS AND GENOMICS, 2022, 49 (10) : 975 - 978
  • [8] Long-read sequencing and de novo assembly of the cynomolgus macaque genome
    Bing Bai
    Yi Wang
    Ran Zhu
    Yaolei Zhang
    Hong Wang
    Guangyi Fan
    Xin Liu
    Hong Shi
    Yuyu Niu
    Weizhi Ji
    JournalofGeneticsandGenomics, 2022, 49 (10) : 975 - 978
  • [9] De novo-assembled long-read genomes of Saccharomyces cerevisiae strains in commerce
    Shwed, P. S.
    Anoop, V.
    Leveque, G.
    MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 2025, 14 (01)
  • [10] Long-read de novo genome assembly of Gulf toadfish (Opsanus beta)
    Kron, Nicholas S.
    Young, Benjamin D.
    Drown, Melissa K.
    Mcdonald, M. Danielle
    BMC GENOMICS, 2024, 25 (01):