GTasm: a genome assembly method using graph transformers and HiFi reads

被引:0
作者
Luo, Junwei [1 ]
Zhang, Ziheng [1 ]
Ma, Xinliang [1 ]
Yan, Chaokun [2 ]
Luo, Huimin [2 ]
机构
[1] Henan Polytech Univ, Sch Software, Jiaozuo, Peoples R China
[2] Henan Univ, Sch Comp & Informat Engn, Kaifeng, Peoples R China
基金
中国国家自然科学基金;
关键词
genome assembly; graph transformer; HiFi read; deep learning; sequencing technique; SEQUENCE;
D O I
10.3389/fgene.2024.1495657
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Motivation: Genome assembly aims to reconstruct the whole chromosome-scale genome sequence. Obtaining accurate and complete chromosome-scale genome sequence serve as an indispensable foundation for downstream genomics analyses. Due to the complex repeat regions contained in genome sequence, the assembly results commonly are fragmented. Long reads with high accuracy rate can greatly enhance the integrity of genome assembly results. Results: Here we introduce GTasm, an assembly method that uses graph transformer network to find optimal assembly results based on assembly graphs. Based on assembly graph, GTasm first extracts features about vertices and edges. Then, GTasm scores the edges by graph transformer model, and adopt a heuristic algorithm to find optimal paths in the assembly graph, each path corresponding to a contig. The graph transformer model is trained using simulated HiFi reads from CHM13, and GTasm is compared with other assembly methods using real HIFI read set. Through experimental result, GTasm can produce well assembly results, and achieve good performance on NA50 and NGA50 evaluation indicators. Applying deep learning models to genome assembly can improve the continuity and accuracy of assembly results. The code is available from https://github.com/chu-xuezhe/GTasm.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] De novo genome assembly and annotation of Holothuria scabra (Jaeger, 1833) from nanopore sequencing reads
    Honglin Luo
    Guanghua Huang
    Jianbin Li
    Qiong Yang
    Jiajie Zhu
    Bin Zhang
    Pengfei Feng
    Yongde Zhang
    Xueming Yang
    Genes & Genomics, 2022, 44 : 1487 - 1498
  • [42] Ultralong Oxford Nanopore Reads Enable the Development of a Reference-Grade Perennial Ryegrass Genome Assembly
    Frei, Daniel
    Veekman, Elisabeth
    Grogg, Daniel
    Stoffel-Studer, Ingrid
    Morishima, Aki
    Shimizu-Inatsugi, Rie
    Yates, Steven
    Shimizu, Kentaro K.
    Frey, Juerg E.
    Studer, Bruno
    Copetti, Dario
    GENOME BIOLOGY AND EVOLUTION, 2021, 13 (08):
  • [43] A Dynamic Hashing Approach to Build the de Bruijn Graph for Genome Assembly
    Zhao, Kun
    Liu, Weiguo
    Voss, Gerrit
    Mueller-Wittig, Wolfgang
    2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON), 2013,
  • [44] DENTIST-using long reads for closing assembly gaps at high accuracy
    Ludwig, Arne
    Pippel, Martin
    Myers, Gene
    Hiller, Michael
    GIGASCIENCE, 2022, 11
  • [45] On using optical maps for genome assembly
    Henry Lin
    Mihai Pop
    Genome Biology, 12 (Suppl 1)
  • [46] On using optical maps for genome assembly
    Henry Lin
    Mihai Pop
    Genome Biology, 12 (Suppl 1)
  • [47] Genome Assembly Using Reinforcement Learning
    Xavier, Roberto
    de Souza, Kleber Padovani
    Chateau, Annie
    Alves, Ronnie
    ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, BSB 2019, 2020, 11347 : 16 - 28
  • [48] Genome inside genome: NGS based identification and assembly of endophytic Sphingopyxis granuli and Pseudomonas aeruginosa genomes from rice genomic reads
    Battu, Latha
    Reddy, Mettu Madhavi
    Goud, Burragoni Sravanthi
    Ulaganathan, Kayalvili
    Kandasamy, Ulaganathan
    GENOMICS, 2017, 109 (3-4) : 141 - 146
  • [49] DENTIST-using long reads for closing assembly gaps at high accuracy
    Ludwig, Arne
    Pippel, Martin
    Myers, Gene
    Hiller, Michael
    GIGASCIENCE, 2022, 11
  • [50] DENTIST-using long reads for closing assembly gaps at high accuracy
    Ludwig, Arne
    Pippel, Martin
    Myers, Gene
    Hiller, Michael
    GIGASCIENCE, 2022, 11