New algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads

被引:3
|
作者
Gonzalez-Garcia, Laura [1 ]
Guevara-Barrientos, David [1 ]
Lozano-Arce, Daniela [1 ]
Gil, Juanita [2 ]
Diaz-Riano, Jorge [1 ]
Duarte, Erick [1 ]
Andrade, German [1 ]
Camilo Bojaca, Juan [1 ]
Camila Hoyos-Sanchez, Maria [1 ]
Chavarro, Christian [1 ]
Guayazan, Natalia [3 ]
Chica, Luis Alberto [4 ,5 ]
Buitrago Acosta, Maria Camila [1 ]
Bautista, Edwin [1 ]
Trujillo, Miller [1 ]
Duitama, Jorge [1 ]
机构
[1] Univ Andes, Syst & Comp Engn Dept, Bogota, Colombia
[2] Univ Arkansas, Dept Entomol & Plant Pathol, Fayetteville, AR USA
[3] Univ Andes, Dept Biol Sci, Bogota, Colombia
[4] Univ Andes, Dept Biol Sci, Res Grp Computat Biol & Microbial Ecol, Bogota, Colombia
[5] Univ Andes, Max Planck Tandem Grp Computat Biol, Bogota, Colombia
关键词
SINGLE; ANNOTATION;
D O I
10.26508/lsa.202201719
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Building de novo genome assemblies for complex genomes is possible thanks to long-read DNA sequencing technologies. However, maximizing the quality of assemblies based on long reads is a challenging task that requires the development of specialized data analysis techniques. We present new algorithms for assembling long DNA sequencing reads from haploid and diploid organisms. The assembly algorithm builds an undirected graph with two vertices for each read based on minimizers se-lected by a hash function derived from the k-mer distribution. Statistics collected during the graph construction are used as features to build layout paths by selecting edges, ranked by a likelihood function. For diploid samples, we integrated a reim-plementation of the ReFHap algorithm to perform molecular phasing. We ran the implemented algorithms on PacBio HiFi and Nanopore sequencing data taken from haploid and diploid samples of different species. Our algorithms showed competitive accuracy and computational efficiency, compared with other currently used software. We expect that this new development will be useful for researchers building genome assemblies for different species.
引用
收藏
页数:13
相关论文
共 43 条
  • [21] Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome
    Bickhart, Derek M.
    Rosen, Benjamin D.
    Koren, Sergey
    Sayre, Brian L.
    Hastie, Alex R.
    Chan, Saki
    Lee, Joyce
    Lam, Ernest T.
    Liachko, Ivan
    Sullivan, Shawn T.
    Burton, Joshua N.
    Huson, Heather J.
    Nystrom, John C.
    Kelley, Christy M.
    Hutchison, Jana L.
    Zhou, Yang
    Sun, Jiajie
    Crisa, Alessandra
    de Leon, F. Abel Ponce
    Schwartz, John C.
    Hammond, John A.
    Waldbieser, Geoffrey C.
    Schroeder, Steven G.
    Liu, George E.
    Dunham, Maitreya J.
    Shendure, Jay
    Sonstegard, Tad S.
    Phillippy, Adam M.
    Van Tassell, Curtis P.
    Smith, Timothy P. L.
    NATURE GENETICS, 2017, 49 (04) : 643 - +
  • [22] ANNOTATION AND RE-SEQUENCING OF GENES FROM DE NOVO TRANSCRIPTOME ASSEMBLY OF ABIES ALBA (PINACEAE)
    Roschanski, Anna M.
    Fady, Bruno
    Ziegenhagen, Birgit
    Liepelt, Sascha
    APPLICATIONS IN PLANT SCIENCES, 2013, 1 (01):
  • [23] Identification of genes involved in drought tolerance in seedlings of the desert grass, Psammochloa villosa (Poaceae), based on full-length isoform sequencing and de novo assembly from short reads
    Liu, Tao
    Liu, Yuping
    Fu, Gui
    Chen, Jinyuan
    Lv, Ting
    Su, Dandan
    Wang, Yanan
    Hu, Xiayu
    Su, Xu
    Harris, A. J.
    JOURNAL OF PLANT PHYSIOLOGY, 2022, 271
  • [24] Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats
    Schmid, Michael
    Frei, Daniel
    Patrignani, Andrea
    Schlapbach, Ralph
    Frey, Jurg E.
    Remus-Emsermann, Mitja N. P.
    Ahrens, Christian H.
    NUCLEIC ACIDS RESEARCH, 2018, 46 (17) : 8953 - 8965
  • [25] A de novo assembly of the sweet cherry (Prunus avium cv. Tieton) genome using linked-read sequencing technology
    Wang, Jiawei
    Liu, Weizhen
    Zhu, Dongzi
    Zhou, Xiang
    Hong, Po
    Zhao, Hongjun
    Tan, Yue
    Chen, Xin
    Zong, Xiaojuan
    Xu, Li
    Zhang, Lisi
    Wei, Hairong
    Liu, Qingzhong
    PEERJ, 2020, 8
  • [26] De novo genome assembly and in natura epigenomics reveal salinity-induced DNA methylation in the mangrove tree Bruguiera gymnorhiza
    Miryeganeh, Matin
    Marletaz, Ferdinand
    Gavriouchkina, Daria
    Saze, Hidetoshi
    NEW PHYTOLOGIST, 2022, 233 (05) : 2094 - 2110
  • [27] The draft chromosome-level genome assembly of tetraploid ground cherry (Prunus fruticosa Pall.) from long reads
    Woehner, Thomas W.
    Emeriewen, Ofere F.
    Wittenberg, Alexander H. J.
    Schneiders, Harrie
    Vrijenhoek, Ilse
    Halasz, Julia
    Hrotko, Karoly
    Hoff, Katharina J.
    Gabriel, Lars
    Lempe, Janne
    Keilwagen, Jens
    Berner, Thomas
    Schuster, Mirko
    Peil, Andreas
    Wunsche, Jens
    Kropop, Stephan
    Flachowsky, Henryk
    GENOMICS, 2021, 113 (06) : 4173 - 4183
  • [28] De Novo Whole-Genome Sequencing and Assembly of the Yellow-Throated Bunting (Emberiza elegans) Provides Insights into Its Evolutionary Adaptation
    Hu, Tingli
    Chen, Guotao
    Xu, Zhen
    Luo, Site
    Wang, Hui
    Li, Chunlin
    Shan, Lei
    Zhang, Baowei
    ANIMALS, 2022, 12 (15):
  • [29] A high-quality de novo genome assembly based on nanopore sequencing of a wild-caught coconut rhinoceros beetle (Oryctes rhinoceros)
    Filipovic, Igor
    Rasic, Gordana
    Hereward, James
    Gharuka, Maria
    Devine, Gregor J.
    Furlong, Michael J.
    Etebari, Kayvan
    BMC GENOMICS, 2022, 23 (01)
  • [30] High-Quality de novo Chromosome-Level Genome Assembly of a Single Bombyx mori With BmNPV Resistance by a Combination of PacBio Long-Read Sequencing, Illumina Short-Read Sequencing, and Hi-C Sequencing
    Tang, Min
    He, Suqun
    Gong, Xun
    Lu, Peng
    Taha, Rehab H.
    Chen, Keping
    FRONTIERS IN GENETICS, 2021, 12