New algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads

被引:3
|
作者
Gonzalez-Garcia, Laura [1 ]
Guevara-Barrientos, David [1 ]
Lozano-Arce, Daniela [1 ]
Gil, Juanita [2 ]
Diaz-Riano, Jorge [1 ]
Duarte, Erick [1 ]
Andrade, German [1 ]
Camilo Bojaca, Juan [1 ]
Camila Hoyos-Sanchez, Maria [1 ]
Chavarro, Christian [1 ]
Guayazan, Natalia [3 ]
Chica, Luis Alberto [4 ,5 ]
Buitrago Acosta, Maria Camila [1 ]
Bautista, Edwin [1 ]
Trujillo, Miller [1 ]
Duitama, Jorge [1 ]
机构
[1] Univ Andes, Syst & Comp Engn Dept, Bogota, Colombia
[2] Univ Arkansas, Dept Entomol & Plant Pathol, Fayetteville, AR USA
[3] Univ Andes, Dept Biol Sci, Bogota, Colombia
[4] Univ Andes, Dept Biol Sci, Res Grp Computat Biol & Microbial Ecol, Bogota, Colombia
[5] Univ Andes, Max Planck Tandem Grp Computat Biol, Bogota, Colombia
关键词
SINGLE; ANNOTATION;
D O I
10.26508/lsa.202201719
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Building de novo genome assemblies for complex genomes is possible thanks to long-read DNA sequencing technologies. However, maximizing the quality of assemblies based on long reads is a challenging task that requires the development of specialized data analysis techniques. We present new algorithms for assembling long DNA sequencing reads from haploid and diploid organisms. The assembly algorithm builds an undirected graph with two vertices for each read based on minimizers se-lected by a hash function derived from the k-mer distribution. Statistics collected during the graph construction are used as features to build layout paths by selecting edges, ranked by a likelihood function. For diploid samples, we integrated a reim-plementation of the ReFHap algorithm to perform molecular phasing. We ran the implemented algorithms on PacBio HiFi and Nanopore sequencing data taken from haploid and diploid samples of different species. Our algorithms showed competitive accuracy and computational efficiency, compared with other currently used software. We expect that this new development will be useful for researchers building genome assemblies for different species.
引用
收藏
页数:13
相关论文
共 43 条
  • [1] Rapid de novo assembly of the European eel genome from nanopore sequencing reads
    Jansen, Hans J.
    Liem, Michael
    Jong-Raadsen, Susanne A.
    Dufour, Sylvie
    Weltzien, Finn-Arne
    Swinkels, William
    Koelewijn, Alex
    Palstra, Arjan P.
    Pelster, Bernd
    Spaink, Herman P.
    van den Thillart, Guido E.
    Dirks, Ron P.
    Henkel, Christiaan V.
    SCIENTIFIC REPORTS, 2017, 7
  • [2] Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly
    Gavrielatos, Marios
    Kyriakidis, Konstantinos
    Spandidos, Demetrios A.
    Michalopoulos, Ioannis
    MOLECULAR MEDICINE REPORTS, 2021, 23 (04)
  • [3] Long-read sequencing and de novo genome assembly of Ammopiptanthus nanus, a desert shrub
    Gao, Fei
    Wang, Xue
    Li, Xuming
    Xu, Mingyue
    Li, Huayun
    Abla, Merhaba
    Sun, Huigai
    Wei, Shanjun
    Feng, Jinchao
    Zhou, Yijun
    GIGASCIENCE, 2018, 7 (07):
  • [4] De novo sequencing, assembly and functional annotation of Armillaria borealis genome
    Akulova, Vasilina S.
    Sharov, Vadim V.
    Aksyonova, Anastasiya I.
    Putintseva, Yuliya A.
    Oreshkova, Natalya V.
    Feranchuk, Sergey I.
    Kuzmin, Dmitry A.
    Pavlov, Igor N.
    Litovka, Yulia A.
    Krutovsky, Konstantin V.
    BMC GENOMICS, 2020, 21 (Suppl 7)
  • [5] De novo sequencing, assembly and functional annotation of Armillaria borealis genome
    Vasilina S. Akulova
    Vadim V. Sharov
    Anastasiya I. Aksyonova
    Yuliya A. Putintseva
    Natalya V. Oreshkova
    Sergey I. Feranchuk
    Dmitry A. Kuzmin
    Igor N. Pavlov
    Yulia A. Litovka
    Konstantin V. Krutovsky
    BMC Genomics, 21
  • [6] An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome
    Ferrarini, Marco
    Moretto, Marco
    Ward, Judson A.
    Surbanovski, Nada
    Stevanovic, Vladimir
    Giongo, Lara
    Viola, Roberto
    Cavalieri, Duccio
    Velasco, Riccardo
    Cestaro, Alessandro
    Sargent, Daniel J.
    BMC GENOMICS, 2013, 14
  • [7] An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome
    Marco Ferrarini
    Marco Moretto
    Judson A Ward
    Nada Šurbanovski
    Vladimir Stevanović
    Lara Giongo
    Roberto Viola
    Duccio Cavalieri
    Riccardo Velasco
    Alessandro Cestaro
    Daniel J Sargent
    BMC Genomics, 14
  • [8] De novo chromosome level assembly of a plant genome from long read sequence data
    Sharma, Priyanka
    Masouleh, Ardashir Kharabian
    Topp, Bruce
    Furtado, Agnelo
    Henry, Robert J.
    PLANT JOURNAL, 2022, 109 (03) : 727 - 736
  • [9] Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools
    Kisand, Veljo
    Lettieri, Teresa
    BMC GENOMICS, 2013, 14
  • [10] Complete de novo assembly of Wolbachia endosymbiont of Diaphorinacitri Kuwayama (Hemiptera: Liviidae) using long-read genome sequencing
    Neupane, Surendra
    Bonilla, Sylvia, I
    Manalo, Andrew M.
    Pelz-Stelinski, Kirsten S.
    SCIENTIFIC REPORTS, 2022, 12 (01)