GTasm: a genome assembly method using graph transformers and HiFi reads

被引:0
作者
Luo, Junwei [1 ]
Zhang, Ziheng [1 ]
Ma, Xinliang [1 ]
Yan, Chaokun [2 ]
Luo, Huimin [2 ]
机构
[1] Henan Polytech Univ, Sch Software, Jiaozuo, Peoples R China
[2] Henan Univ, Sch Comp & Informat Engn, Kaifeng, Peoples R China
基金
中国国家自然科学基金;
关键词
genome assembly; graph transformer; HiFi read; deep learning; sequencing technique; SEQUENCE;
D O I
10.3389/fgene.2024.1495657
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Motivation: Genome assembly aims to reconstruct the whole chromosome-scale genome sequence. Obtaining accurate and complete chromosome-scale genome sequence serve as an indispensable foundation for downstream genomics analyses. Due to the complex repeat regions contained in genome sequence, the assembly results commonly are fragmented. Long reads with high accuracy rate can greatly enhance the integrity of genome assembly results. Results: Here we introduce GTasm, an assembly method that uses graph transformer network to find optimal assembly results based on assembly graphs. Based on assembly graph, GTasm first extracts features about vertices and edges. Then, GTasm scores the edges by graph transformer model, and adopt a heuristic algorithm to find optimal paths in the assembly graph, each path corresponding to a contig. The graph transformer model is trained using simulated HiFi reads from CHM13, and GTasm is compared with other assembly methods using real HIFI read set. Through experimental result, GTasm can produce well assembly results, and achieve good performance on NA50 and NGA50 evaluation indicators. Applying deep learning models to genome assembly can improve the continuity and accuracy of assembly results. The code is available from https://github.com/chu-xuezhe/GTasm.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly
    Xu, Gui-Cai
    Xu, Tian-Jun
    Zhu, Rui
    Zhang, Yan
    Li, Shang-Qi
    Wang, Hong-Wei
    Li, Jiong-Tang
    GIGASCIENCE, 2019, 8 (01): : 1 - 14
  • [32] Finding Nemo: hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the clownfish (Amphiprion ocellaris) genome assembly
    Tan, Mun Hua
    Austin, Christopher M.
    Hammer, Michael P.
    Lee, Yin Peng
    Croft, Laurence J.
    Gan, Han Ming
    GIGASCIENCE, 2018, 7 (03):
  • [33] Integrated hybrid de novo assembly technologies to obtain high-quality pig genome using short and long reads
    Du, Heng
    Diao, Chenguang
    Zhao, Pengju
    Zhou, Lei
    Liu, Jian-Feng
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [34] Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads
    Vollger, Mitchell R.
    Logsdon, Glennis A.
    Audano, Peter A.
    Sulovari, Arvis
    Porubsky, David
    Peluso, Paul
    Wenger, Aaron M.
    Concepcion, Gregory T.
    Kronenberg, Zev N.
    Munson, Katherine M.
    Baker, Carl
    Sanders, Ashley D.
    Spierings, Diana C. J.
    Lansdorp, Peter M.
    Surti, Urvashi
    Hunkapiller, Michael W.
    Eichler, Evan E.
    ANNALS OF HUMAN GENETICS, 2020, 84 (02) : 125 - 140
  • [35] The draft chromosome-level genome assembly of tetraploid ground cherry (Prunus fruticosa Pall.) from long reads
    Woehner, Thomas W.
    Emeriewen, Ofere F.
    Wittenberg, Alexander H. J.
    Schneiders, Harrie
    Vrijenhoek, Ilse
    Halasz, Julia
    Hrotko, Karoly
    Hoff, Katharina J.
    Gabriel, Lars
    Lempe, Janne
    Keilwagen, Jens
    Berner, Thomas
    Schuster, Mirko
    Peil, Andreas
    Wunsche, Jens
    Kropop, Stephan
    Flachowsky, Henryk
    GENOMICS, 2021, 113 (06) : 4173 - 4183
  • [36] Assembly and analysis of sequence from a spring and winter type Camelina sativa by whole genome PacBio HiFi technologies
    Ontano, Andrew
    Dobrin, Barbara
    Smith, Timothy
    Abernathy, Brian
    Kandel, Jinita Sthapit
    Shaikh, Tm
    Anderson, James, V
    Vaughn, Justin
    Raman, Mukhlesur
    Horvath, David
    INDUSTRIAL CROPS AND PRODUCTS, 2024, 221
  • [37] Whole-genome haplotyping using long reads and statistical methods
    Kuleshov, Volodymyr
    Xie, Dan
    Chen, Rui
    Pushkarev, Dmitry
    Ma, Zhihai
    Blauwkamp, Tim
    Kertesz, Michael
    Snyder, Michael
    NATURE BIOTECHNOLOGY, 2014, 32 (03) : 261 - 266
  • [38] Scalable De Novo Genome Assembly Using a Pregel-Like Graph-Parallel System
    Guo, Guimu
    Chen, Hongzhi
    Yan, Da
    Cheng, James
    Chen, Jake Y.
    Chong, Zechen
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (02) : 731 - 744
  • [39] SORA: Scalable Overlap-graph Reduction Algorithms for Genome Assembly using Apache Spark in the Cloud
    Paul, Alexander J.
    Lawrence, Dylan
    Song, Myoungkyu
    Lim, Seung-Hwan
    Pan, Chongle
    Ahn, Tae-Hyuk
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 718 - 723
  • [40] De novo genome assembly and annotation of Holothuria scabra (Jaeger, 1833) from nanopore sequencing reads
    Luo, Honglin
    Huang, Guanghua
    Li, Jianbin
    Yang, Qiong
    Zhu, Jiajie
    Zhang, Bin
    Feng, Pengfei
    Zhang, Yongde
    Yang, Xueming
    GENES & GENOMICS, 2022, 44 (12) : 1487 - 1498