GTasm: a genome assembly method using graph transformers and HiFi reads

被引:0
|
作者
Luo, Junwei [1 ]
Zhang, Ziheng [1 ]
Ma, Xinliang [1 ]
Yan, Chaokun [2 ]
Luo, Huimin [2 ]
机构
[1] Henan Polytech Univ, Sch Software, Jiaozuo, Peoples R China
[2] Henan Univ, Sch Comp & Informat Engn, Kaifeng, Peoples R China
基金
中国国家自然科学基金;
关键词
genome assembly; graph transformer; HiFi read; deep learning; sequencing technique; SEQUENCE;
D O I
10.3389/fgene.2024.1495657
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Motivation: Genome assembly aims to reconstruct the whole chromosome-scale genome sequence. Obtaining accurate and complete chromosome-scale genome sequence serve as an indispensable foundation for downstream genomics analyses. Due to the complex repeat regions contained in genome sequence, the assembly results commonly are fragmented. Long reads with high accuracy rate can greatly enhance the integrity of genome assembly results. Results: Here we introduce GTasm, an assembly method that uses graph transformer network to find optimal assembly results based on assembly graphs. Based on assembly graph, GTasm first extracts features about vertices and edges. Then, GTasm scores the edges by graph transformer model, and adopt a heuristic algorithm to find optimal paths in the assembly graph, each path corresponding to a contig. The graph transformer model is trained using simulated HiFi reads from CHM13, and GTasm is compared with other assembly methods using real HIFI read set. Through experimental result, GTasm can produce well assembly results, and achieve good performance on NA50 and NGA50 evaluation indicators. Applying deep learning models to genome assembly can improve the continuity and accuracy of assembly results. The code is available from https://github.com/chu-xuezhe/GTasm.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] First genome assembly of the order Strepsiptera using PacBio HiFi reads reveals a miniature genome
    Castano, Maria Isabel
    Ye, Xinhai
    Uy, Floria M. K.
    SCIENTIFIC DATA, 2024, 11 (01)
  • [2] High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads
    Wang, Bo
    Yang, Xiaofei
    Jia, Yanyan
    Xu, Yu
    Jia, Peng
    Dang, Ningxin
    Wang, Songbo
    Xu, Tun
    Zhao, Xixi
    Gao, Shenghan
    Dong, Quanbin
    Ye, Kai
    GENOMICS PROTEOMICS & BIOINFORMATICS, 2022, 20 (01) : 4 - 13
  • [3] High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads
    Bo Wang
    Xiaofei Yang
    Yanyan Jia
    Yu Xu
    Peng Jia
    Ningxin Dang
    Songbo Wang
    Tun Xu
    Xixi Zhao
    Shenghan Gao
    Quanbin Dong
    Kai Ye
    Genomics,Proteomics & Bioinformatics, 2022, Proteomics & Bioinformatics2022 (01) : 4 - 13
  • [4] Chromosome-level genome assembly of Nibea coibor using PacBio HiFi reads and Hi-C technologies
    Dinaer Yekefenhazi
    Qiwei He
    Xiaopeng Wang
    Wei Han
    Chaowei Song
    Wanbo Li
    Scientific Data, 9
  • [5] Chromosome-level genome assembly of Nibea coibor using PacBio HiFi reads and Hi-C technologies
    Yekefenhazi, Dinaer
    He, Qiwei
    Wang, Xiaopeng
    Han, Wei
    Song, Chaowei
    Li, Wanbo
    SCIENTIFIC DATA, 2022, 9 (01)
  • [6] Chromosome-level genome assembly of humpback grouper using PacBio HiFi reads and Hi-C technologies
    Liu, Jinxiang
    Sun, Huibang
    Tang, Lei
    Wang, Yujue
    Wang, Zhigang
    Mao, Yunxiang
    Huang, Hai
    Zhang, Quanqi
    SCIENTIFIC DATA, 2024, 11 (01)
  • [7] Chromosome-level genome assembly of humpback grouper using PacBio HiFi reads and Hi-C technologies
    Jinxiang Liu
    Huibang Sun
    Lei Tang
    Yujue Wang
    Zhigang Wang
    Yunxiang Mao
    Hai Huang
    Quanqi Zhang
    Scientific Data, 11
  • [8] A mitochondrial genome assembly of the opal chimaera, Chimaera opalescens Luchetti, Iglesias et Sellos 2011, using PacBio HiFi long reads
    Vilas-Arrondo, Nair
    Gomes-dos-Santos, Andre
    Perez, Montse
    Baldo, Francisco
    Verissimo, Ana
    Catarino, Diana
    Machado, Andre M.
    Roman-Marcote, Esther
    Banon, Rafael
    Froufe, Elsa
    Castro, L. Filipe C.
    MITOCHONDRIAL DNA PART B-RESOURCES, 2022, 7 (03): : 434 - 437
  • [9] The complete mitochondrial genome of Pontederia crassipes: using HiFi reads to investigate genome recombination and gene transfer from chloroplast genome
    Hao, Zhigang
    Jiang, Xiaoqi
    Pan, Lei
    Guo, Jingyuan
    Chen, Yi
    Li, Jianqiang
    Liu, Biao
    Guo, Anping
    Luo, Laixin
    Jia, Ruizong
    FRONTIERS IN PLANT SCIENCE, 2024, 15
  • [10] A haplotype-resolved genome assembly of Rhododendron vialii based on PacBio HiFi reads and Hi-C data
    Yuhang Chang
    Rengang Zhang
    Yongpeng Ma
    Weibang Sun
    Scientific Data, 10