New algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads

被引:3
|
作者
Gonzalez-Garcia, Laura [1 ]
Guevara-Barrientos, David [1 ]
Lozano-Arce, Daniela [1 ]
Gil, Juanita [2 ]
Diaz-Riano, Jorge [1 ]
Duarte, Erick [1 ]
Andrade, German [1 ]
Camilo Bojaca, Juan [1 ]
Camila Hoyos-Sanchez, Maria [1 ]
Chavarro, Christian [1 ]
Guayazan, Natalia [3 ]
Chica, Luis Alberto [4 ,5 ]
Buitrago Acosta, Maria Camila [1 ]
Bautista, Edwin [1 ]
Trujillo, Miller [1 ]
Duitama, Jorge [1 ]
机构
[1] Univ Andes, Syst & Comp Engn Dept, Bogota, Colombia
[2] Univ Arkansas, Dept Entomol & Plant Pathol, Fayetteville, AR USA
[3] Univ Andes, Dept Biol Sci, Bogota, Colombia
[4] Univ Andes, Dept Biol Sci, Res Grp Computat Biol & Microbial Ecol, Bogota, Colombia
[5] Univ Andes, Max Planck Tandem Grp Computat Biol, Bogota, Colombia
关键词
SINGLE; ANNOTATION;
D O I
10.26508/lsa.202201719
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Building de novo genome assemblies for complex genomes is possible thanks to long-read DNA sequencing technologies. However, maximizing the quality of assemblies based on long reads is a challenging task that requires the development of specialized data analysis techniques. We present new algorithms for assembling long DNA sequencing reads from haploid and diploid organisms. The assembly algorithm builds an undirected graph with two vertices for each read based on minimizers se-lected by a hash function derived from the k-mer distribution. Statistics collected during the graph construction are used as features to build layout paths by selecting edges, ranked by a likelihood function. For diploid samples, we integrated a reim-plementation of the ReFHap algorithm to perform molecular phasing. We ran the implemented algorithms on PacBio HiFi and Nanopore sequencing data taken from haploid and diploid samples of different species. Our algorithms showed competitive accuracy and computational efficiency, compared with other currently used software. We expect that this new development will be useful for researchers building genome assemblies for different species.
引用
收藏
页数:13
相关论文
共 43 条
  • [41] De novo genome assembly of rice bean (Vigna umbellata) - A nominated nutritionally rich future crop reveals novel insights into flowering potential, habit, and palatability centric - traits for efficient domestication
    Kaul, Tanushri
    Easwaran, Murugesh
    Thangaraj, Arulprakash
    Meyyazhagan, Arun
    Nehra, Mamta
    Raman, Nitya Meenakshi
    Verma, Rachana
    Sony, Sonia Khan
    Abdel, Khaled Fathy
    Bharti, Jyotsna
    Gayacharan
    Badapanda, Chandan
    Balasubramanian, Balamuralikrishnan
    FRONTIERS IN PLANT SCIENCE, 2022, 13
  • [42] De Novo Assembly and Characterization of Bud, Leaf and Flowers Transcriptome from Juglans Regia L. for the Identification and Characterization of New EST-SSRs
    Dang, Meng
    Zhang, Tian
    Hu, Yiheng
    Zhou, Huijuan
    Woeste, Keith E.
    Zhao, Peng
    FORESTS, 2016, 7 (10)
  • [43] Whole Genome Sequences, De Novo Assembly, and Annotation of Antibiotic Resistant Campylobacter jejuni Strains S27, S33, and S36 Newly Isolated from Chicken Meat
    He, Yiping
    Kanrar, Siddhartha
    Reed, Sue
    Lee, Joe
    Capobianco, Joseph
    MICROORGANISMS, 2024, 12 (01)