Benchmarking of long-read sequencing, assemblers and polishers for yeast genome

被引:17
作者
Zhang, Xue [1 ]
Liu, Chen-Guang [1 ]
Yang, Shi-Hui [2 ]
Wang, Xia [2 ]
Bai, Feng-Wu [1 ]
Wang, Zhuo [3 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Life Sci & Biotechnol, Shanghai, Peoples R China
[2] Hubei Univ, Sch Life Sci, Wuhan, Peoples R China
[3] Shanghai Jiao Tong Univ, Sch Life Sci & Biotechnol, Bio X Inst, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
de novo assembly; long-read sequencing; benchmarking; yeast; data depth; genome analysis; ACCURATE;
D O I
10.1093/bib/bbac146
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background The long reads of the third-generation sequencing significantly benefit the quality of the de novo genome assembly. However, its relatively high single-base error rate has been criticized. Currently, sequencing accuracy and throughput continue to improve, and many advanced tools are constantly emerging. PacBio HiFi sequencing and Oxford Nanopore Technologies (ONT) PromethION are two up-to-date platforms with low error rates and ultralong high-throughput reads. Therefore, it is urgently needed to select the appropriate sequencing platforms, depths and genome assembly tools for high-quality genomes in the era of explosive data production. Methods We performed 455 (7 assemblers with 4 polishing pipelines or without polishing on 13 subsets with different depths) and 88 (4 assemblers with or without polishing on 11 subsets with different depths) de novo assemblies of Yeast S288C on high-coverage ONT and HiFi datasets, respectively. The assembly quality was evaluated by Quality Assessment Tool (QUAST), Benchmarking Universal Single-Copy Orthologs (BUSCO) and the newly proposed Comprehensive_score (C_score). In addition, we applied four preferable pipelines to assemble the genome of nonreference yeast strains. Results The assembler plays an essential role in genome construction, especially for low-depth datasets. For ONT datasets, Flye is superior to other tools through C_score evaluation. Polishing by Pilon and Medaka improve accuracy and continuity of the preassemblies, respectively, and their combination pipeline worked well in most quality metrics. For HiFi datasets, Flye and NextDenovo performed better than other tools, and polishing is also necessary. Enough data depth is required for high-quality genome construction by ONT (>80X) and HiFi (>20X) datasets.
引用
收藏
页数:13
相关论文
共 24 条
[1]   Opportunities and challenges in long-read sequencing data analysis [J].
Amarasinghe, Shanika L. ;
Su, Shian ;
Dong, Xueyi ;
Zappia, Luke ;
Ritchie, Matthew E. ;
Gouil, Quentin .
GENOME BIOLOGY, 2020, 21 (01)
[2]   Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions [J].
Cali, Damla Senol ;
Kim, Jeremie S. ;
Ghose, Saugata ;
Alkan, Can ;
Mutlu, Onur .
BRIEFINGS IN BIOINFORMATICS, 2019, 20 (04) :1542-1559
[3]   Efficient assembly of nanopore reads via highly accurate and intact error correction [J].
Chen, Ying ;
Nie, Fan ;
Xie, Shang-Qian ;
Zheng, Ying-Feng ;
Dai, Qi ;
Bray, Thomas ;
Wang, Yao-Xin ;
Xing, Jian-Feng ;
Huang, Zhi-Jian ;
Wang, De-Peng ;
He, Li-Juan ;
Luo, Feng ;
Wang, Jian-Xin ;
Liu, Yi-Zhi ;
Xiao, Chuan-Le .
NATURE COMMUNICATIONS, 2021, 12 (01)
[4]   Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing [J].
Chen, Zhao ;
Erickson, David L. ;
Meng, Jianghong .
BMC GENOMICS, 2020, 21 (01)
[5]   Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm [J].
Cheng, Haoyu ;
Concepcion, Gregory T. ;
Feng, Xiaowen ;
Zhang, Haowen ;
Li, Heng .
NATURE METHODS, 2021, 18 (02) :170-+
[6]  
Giani AM, 2020, COMPUT STRUCT BIOTEC, V18, P9, DOI [10.1016/j.csbj.2019.11.002, 10.1016/j.csbj.2019.11.022]
[7]   De novo yeast genome assemblies from MinION, PacBio and MiSeq platforms [J].
Giordano, Francesca ;
Aigrain, Louise ;
Quail, Michael A. ;
Coupland, Paul ;
Bonfield, James K. ;
Davies, Robert M. ;
Tischler, German ;
Jackson, David K. ;
Keane, Thomas M. ;
Li, Jing ;
Yue, Jia-Xing ;
Liti, Gianni ;
Durbin, Richard ;
Ning, Zemin .
SCIENTIFIC REPORTS, 2017, 7
[8]   QUAST: quality assessment tool for genome assemblies [J].
Gurevich, Alexey ;
Saveliev, Vladislav ;
Vyahhi, Nikolay ;
Tesler, Glenn .
BIOINFORMATICS, 2013, 29 (08) :1072-1075
[9]   NeuralPolish: a novel Nanopore polishing method based on alignment matrix construction and orthogonal Bi-GRU Networks [J].
Huang, Neng ;
Nie, Fan ;
Ni, Peng ;
Luo, Feng ;
Gao, Xin ;
Wang, Jianxin .
BIOINFORMATICS, 2021, 37 (19) :3120-3127
[10]   Choice of assemblers has a critical impact on de novo assembly of SARS-CoV-2 genome and characterizing variants [J].
Islam, Rashedul ;
Raju, Rajan Saha ;
Tasnim, Nazia ;
Shihab, Istiak Hossain ;
Bhuiyan, Maruf Ahmed ;
Araf, Yusha ;
Islam, Tofazzal .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)