GTasm: a genome assembly method using graph transformers and HiFi reads

被引:0
|
作者
Luo, Junwei [1 ]
Zhang, Ziheng [1 ]
Ma, Xinliang [1 ]
Yan, Chaokun [2 ]
Luo, Huimin [2 ]
机构
[1] Henan Polytech Univ, Sch Software, Jiaozuo, Peoples R China
[2] Henan Univ, Sch Comp & Informat Engn, Kaifeng, Peoples R China
基金
中国国家自然科学基金;
关键词
genome assembly; graph transformer; HiFi read; deep learning; sequencing technique; SEQUENCE;
D O I
10.3389/fgene.2024.1495657
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Motivation: Genome assembly aims to reconstruct the whole chromosome-scale genome sequence. Obtaining accurate and complete chromosome-scale genome sequence serve as an indispensable foundation for downstream genomics analyses. Due to the complex repeat regions contained in genome sequence, the assembly results commonly are fragmented. Long reads with high accuracy rate can greatly enhance the integrity of genome assembly results. Results: Here we introduce GTasm, an assembly method that uses graph transformer network to find optimal assembly results based on assembly graphs. Based on assembly graph, GTasm first extracts features about vertices and edges. Then, GTasm scores the edges by graph transformer model, and adopt a heuristic algorithm to find optimal paths in the assembly graph, each path corresponding to a contig. The graph transformer model is trained using simulated HiFi reads from CHM13, and GTasm is compared with other assembly methods using real HIFI read set. Through experimental result, GTasm can produce well assembly results, and achieve good performance on NA50 and NGA50 evaluation indicators. Applying deep learning models to genome assembly can improve the continuity and accuracy of assembly results. The code is available from https://github.com/chu-xuezhe/GTasm.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] The eukaryotic genome, its reads, and the unfinished assembly
    Fernando Munoz, Jose
    Esteban Gallo, Juan
    Misas, Elizabeth
    Guillermo McEwen, Juan
    Keatinge Clay, Oliver
    FEBS LETTERS, 2013, 587 (14) : 2090 - 2093
  • [22] HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly
    Sheina B. Sim
    Renee L. Corpuz
    Tyler J. Simmonds
    Scott M. Geib
    BMC Genomics, 23
  • [23] Genome Sequencing and Assembly by Long Reads in Plants
    Li, Changsheng
    Lin, Feng
    An, Dong
    Wang, Wenqin
    Huang, Ruidong
    GENES, 2018, 9 (01):
  • [24] HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads
    Al-okaily, Anas A.
    BMC GENOMICS, 2016, 17
  • [25] SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs
    Tolstoganov, Ivan
    Chen, Zhoutao
    Pevzner, Pavel
    Korobeynikov, Anton
    PEERJ, 2024, 12
  • [26] Assembly and comparative analysis of the complete mitochondrial genome of Salix wilsonii using PacBio HiFi sequencing
    Han, Fuchuan
    Qu, Yanshu
    Chen, Yicun
    Xu, Li'an
    Bi, Changwei
    FRONTIERS IN PLANT SCIENCE, 2022, 13
  • [27] A new algorithm for genome assembly from short reads
    Blazewicz, Jacek
    Bryja, Marcin
    Figlerowicz, Marek
    Gawron, Piotr
    Kasprzak, Marta
    Platt, Darren
    Przybytek, Jakub
    Swiercz, Aleksandra
    Szajkowski, Lukasz
    PROCEEDINGS OF THE 2008 1ST INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, 2008, : 455 - +
  • [28] Using long and linked reads to improve an Atlantic herring (Clupea harengus) genome assembly
    Kongsstovu, Sunnvor i
    Mikalsen, Svein-Ole
    Homrum, Eydna i
    Jacobsen, Jan Arge
    Flicek, Paul
    Dahl, Hans Atli
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [29] LongStitch: high-quality genome assembly correction and scaffolding using long reads
    Lauren Coombe
    Janet X. Li
    Theodora Lo
    Johnathan Wong
    Vladimir Nikolic
    René L. Warren
    Inanc Birol
    BMC Bioinformatics, 22
  • [30] A new strategy for genome assembly using short sequence reads and reduced representation libraries
    Young, Andrew L.
    Abaan, Hatice Ozel
    Zerbino, Daniel
    Mullikin, James C.
    Birney, Ewan
    Margulies, Elliott H.
    GENOME RESEARCH, 2010, 20 (02) : 249 - 256