New algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads

被引:3
|
作者
Gonzalez-Garcia, Laura [1 ]
Guevara-Barrientos, David [1 ]
Lozano-Arce, Daniela [1 ]
Gil, Juanita [2 ]
Diaz-Riano, Jorge [1 ]
Duarte, Erick [1 ]
Andrade, German [1 ]
Camilo Bojaca, Juan [1 ]
Camila Hoyos-Sanchez, Maria [1 ]
Chavarro, Christian [1 ]
Guayazan, Natalia [3 ]
Chica, Luis Alberto [4 ,5 ]
Buitrago Acosta, Maria Camila [1 ]
Bautista, Edwin [1 ]
Trujillo, Miller [1 ]
Duitama, Jorge [1 ]
机构
[1] Univ Andes, Syst & Comp Engn Dept, Bogota, Colombia
[2] Univ Arkansas, Dept Entomol & Plant Pathol, Fayetteville, AR USA
[3] Univ Andes, Dept Biol Sci, Bogota, Colombia
[4] Univ Andes, Dept Biol Sci, Res Grp Computat Biol & Microbial Ecol, Bogota, Colombia
[5] Univ Andes, Max Planck Tandem Grp Computat Biol, Bogota, Colombia
关键词
SINGLE; ANNOTATION;
D O I
10.26508/lsa.202201719
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Building de novo genome assemblies for complex genomes is possible thanks to long-read DNA sequencing technologies. However, maximizing the quality of assemblies based on long reads is a challenging task that requires the development of specialized data analysis techniques. We present new algorithms for assembling long DNA sequencing reads from haploid and diploid organisms. The assembly algorithm builds an undirected graph with two vertices for each read based on minimizers se-lected by a hash function derived from the k-mer distribution. Statistics collected during the graph construction are used as features to build layout paths by selecting edges, ranked by a likelihood function. For diploid samples, we integrated a reim-plementation of the ReFHap algorithm to perform molecular phasing. We ran the implemented algorithms on PacBio HiFi and Nanopore sequencing data taken from haploid and diploid samples of different species. Our algorithms showed competitive accuracy and computational efficiency, compared with other currently used software. We expect that this new development will be useful for researchers building genome assemblies for different species.
引用
收藏
页数:13
相关论文
共 43 条
  • [31] High-quality de novo assembly of the Eucommia ulmoides haploid genome provides new insights into evolution and rubber biosynthesis
    Li, Yun
    Wei, Hairong
    Yang, Jun
    Du, Kang
    Li, Jiang
    Zhang, Ying
    Qiu, Tong
    Liu, Zhao
    Ren, Yongyu
    Song, Lianjun
    Kang, Xiangyang
    HORTICULTURE RESEARCH, 2020, 7 (01)
  • [32] Chromosome-level de novo assembly of the pig-tailed macaque genome using linked-read sequencing and HiC proximity scaffolding
    Roodgar, Morteza
    Babveyh, Afshin
    Nguyen, Lan H.
    Zhou, Wenyu
    Sinha, Rahul
    Lee, Hayan
    Hanks, John B.
    Avula, Mohan
    Jiang, Lihua
    Jian, Ruiqi
    Lee, Hoyong
    Song, Giltae
    Chaib, Hassan
    Weissman, Irv L.
    Batzoglou, Serafim
    Holmes, Susan
    Smith, David G.
    Mankowski, Joseph L.
    Prost, Stefan
    Snyder, Michael P.
    GIGASCIENCE, 2020, 9 (07):
  • [33] A high-quality de novo genome assembly of one swamp eel (Monopterus albus) strain with PacBio and Hi-C sequencing data
    Tian, Hai-Feng
    Hu, Qiao-Mu
    Li, Zhong
    G3-GENES GENOMES GENETICS, 2021, 11 (01):
  • [34] De novo transcriptome assembly of the calanoid copepod Neocalanus flemingeri: A new resource for emergence from diapause
    Roncalli, Vittoria
    Cieslak, Matthew C.
    Sommer, Stephanie A.
    Hoperoft, Russell R.
    Lenz, Petra H.
    MARINE GENOMICS, 2018, 37 : 114 - 119
  • [35] De novo assembly of a chromosome-level reference genome of red-spotted grouper (Epinephelus akaara) using nanopore sequencing and Hi-C
    Ge, Hui
    Lin, Kebing
    Shen, Mi
    Wu, Shuiqing
    Wang, Yilei
    Zhang, Ziping
    Wang, Zhiyong
    Zhang, Yong
    Huang, Zhen
    Zhou, Chen
    Lin, Qi
    Wu, Jianshao
    Liu, Lei
    Hu, Jiang
    Huang, Zhongchi
    Zheng, Leyun
    MOLECULAR ECOLOGY RESOURCES, 2019, 19 (06) : 1461 - 1469
  • [36] De novo assembly and characterization of the leaf, bud, and fruit transcriptome from the vulnerable tree Juglans mandshurica for the development of 20 new microsatellite markers using Illumina sequencing
    Hu, Zhuang
    Zhang, Tian
    Gao, Xiao-Xiao
    Wang, Yang
    Zhang, Qiang
    Zhou, Hui-Juan
    Zhao, Gui-Fang
    Wang, Ma-Li
    Woeste, Keith E.
    Zhao, Peng
    MOLECULAR GENETICS AND GENOMICS, 2016, 291 (02) : 849 - 862
  • [37] Long-Read Sequencing and De Novo Genome Assembly Pipeline of Two Plasmodium falciparum Clones (Pf3D7, PfW2) Using Only the PromethION Sequencer from Oxford Nanopore Technologies without Whole-Genome Amplification
    Delandre, Oceane
    Lamer, Ombeline
    Loreau, Jean-Marie
    Mze, Nasserdine Papa
    Fonta, Isabelle
    Mosnier, Joel
    Gomez, Nicolas
    Javelle, Emilie
    Pradines, Bruno
    BIOLOGY-BASEL, 2024, 13 (02):
  • [38] De Novo Hybrid Assembly of the Salvia miltiorrhiza Mitochondrial Genome Provides the First Evidence of the Multi-Chromosomal Mitochondrial DNA Structure of Salvia Species
    Yang, Heyu
    Chen, Haimei
    Ni, Yang
    Li, Jingling
    Cai, Yisha
    Ma, Binxin
    Yu, Jing
    Wang, Jiehua
    Liu, Chang
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (22)
  • [39] De Novo Transcriptome Sequence Assembly from Coconut Leaves and Seeds with a Focus on Factors Involved in RNA-Directed DNA Methylation
    Huang, Ya-Yi
    Lee, Chueh-Pai
    Fu, Jason L.
    Chang, Bill Chia-Han
    Matzke, Antonius J. M.
    Matzke, Marjori
    G3-GENES GENOMES GENETICS, 2014, 4 (11): : 2147 - 2157
  • [40] Long-read sequencing of extrachromosomal circular DNA and genome assembly of a Solanum lycopersicum breeding line revealed active LTR retrotransposons originating from S. Peruvianum L. introgressions
    Merkulov, Pavel
    Serganova, Melania
    Petrov, Georgy
    Mityukov, Vladislav
    Kirov, Ilya
    BMC GENOMICS, 2024, 25 (01)