GTasm: a genome assembly method using graph transformers and HiFi reads

被引:0
|
作者
Luo, Junwei [1 ]
Zhang, Ziheng [1 ]
Ma, Xinliang [1 ]
Yan, Chaokun [2 ]
Luo, Huimin [2 ]
机构
[1] Henan Polytech Univ, Sch Software, Jiaozuo, Peoples R China
[2] Henan Univ, Sch Comp & Informat Engn, Kaifeng, Peoples R China
基金
中国国家自然科学基金;
关键词
genome assembly; graph transformer; HiFi read; deep learning; sequencing technique; SEQUENCE;
D O I
10.3389/fgene.2024.1495657
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Motivation: Genome assembly aims to reconstruct the whole chromosome-scale genome sequence. Obtaining accurate and complete chromosome-scale genome sequence serve as an indispensable foundation for downstream genomics analyses. Due to the complex repeat regions contained in genome sequence, the assembly results commonly are fragmented. Long reads with high accuracy rate can greatly enhance the integrity of genome assembly results. Results: Here we introduce GTasm, an assembly method that uses graph transformer network to find optimal assembly results based on assembly graphs. Based on assembly graph, GTasm first extracts features about vertices and edges. Then, GTasm scores the edges by graph transformer model, and adopt a heuristic algorithm to find optimal paths in the assembly graph, each path corresponding to a contig. The graph transformer model is trained using simulated HiFi reads from CHM13, and GTasm is compared with other assembly methods using real HIFI read set. Through experimental result, GTasm can produce well assembly results, and achieve good performance on NA50 and NGA50 evaluation indicators. Applying deep learning models to genome assembly can improve the continuity and accuracy of assembly results. The code is available from https://github.com/chu-xuezhe/GTasm.
引用
收藏
页数:11
相关论文
共 50 条
  • [11] Genome assembly using Nanopore-guided long and error-free DNA reads
    Mohammed-Amin Madoui
    Stefan Engelen
    Corinne Cruaud
    Caroline Belser
    Laurie Bertrand
    Adriana Alberti
    Arnaud Lemainque
    Patrick Wincker
    Jean-Marc Aury
    BMC Genomics, 16
  • [12] NucBreak: location of structural errors in a genome assembly by using paired-end Illumina reads
    Khelik, Ksenia
    Sandve, Geir Kjetil
    Nederbragt, Alexander Johan
    Rognes, Torbjorn
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [13] Genome Assembly of the A-Group Wolbachia in Nasonia oneida Using Linked-Reads Technology
    Wang, Xiaozhu
    Xiong, Xiao
    Cao, Wenqi
    Zhang, Chao
    Werren, John H.
    Wang, Xu
    GENOME BIOLOGY AND EVOLUTION, 2019, 11 (10): : 3008 - 3013
  • [14] NucBreak: location of structural errors in a genome assembly by using paired-end Illumina reads
    Ksenia Khelik
    Geir Kjetil Sandve
    Alexander Johan Nederbragt
    Torbjørn Rognes
    BMC Bioinformatics, 21
  • [15] Genome assembly using Nanopore-guided long and error-free DNA reads
    Madoui, Mohammed-Amin
    Engelen, Stefan
    Cruaud, Corinne
    Belser, Caroline
    Bertrand, Laurie
    Alberti, Adriana
    Lemainque, Arnaud
    Wincker, Patrick
    Aury, Jean-Marc
    BMC GENOMICS, 2015, 16
  • [16] Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231
    Baptista, Rodrigo P.
    Reis-Cunha, Joao Luis
    DeBarry, Jeremy D.
    Chiari, Egler
    Kissinger, Jessica C.
    Bartholomeu, Daniella C.
    Macedo, Andrea M.
    MICROBIAL GENOMICS, 2018, 4 (04):
  • [17] Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C
    Chen, Stephanie H.
    Rossetto, Maurizio
    van der Merwe, Marlien
    Lu-Irving, Patricia
    Yap, Jia-Yee S.
    Sauquet, Herve
    Bourke, Greg
    Amos, Timothy G.
    Bragg, Jason G.
    Edwards, Richard J.
    MOLECULAR ECOLOGY RESOURCES, 2022, 22 (05) : 1836 - 1854
  • [18] PBSIM: PacBio reads simulator-toward accurate genome assembly
    Ono, Yukiteru
    Asai, Kiyoshi
    Hamada, Michiaki
    BIOINFORMATICS, 2013, 29 (01) : 119 - 121
  • [19] A new strategy for better genome assembly from very short reads
    Ji, Yan
    Shi, Yixiang
    Ding, Guohui
    Li, Yixue
    BMC BIOINFORMATICS, 2011, 12
  • [20] Cost-effective assembly of the African wild dog (Lycaon pictus) genome using linked reads
    Armstrong, Ellie E.
    Taylor, Ryan W.
    Prost, Stefan
    Blinston, Peter
    van der Meer, Esther
    Madzikanda, Hillary
    Mufute, Olivia
    Mandisodza-Chikerema, Roseline
    Stuelpnagel, John
    Sillero-Zubiri, Claudio
    Petrov, Dmitri
    GIGASCIENCE, 2019, 8 (02):