Transformer-based embedding applied to classify bacterial species using sequencing reads

被引:0
|
作者
Gwak, Ho-Jin [1 ]
Rho, Mina [1 ]
机构
[1] Hanyang Univ, Dept Comp Sci, Seoul, South Korea
来源
2022 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (IEEE BIGCOMP 2022) | 2022年
关键词
embedding; transformer; deep learning; classification; Staphylococcus species; SOFTWARE;
D O I
10.1109/BigComp54360.2022.00084
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the emergence of next-generation sequencing and metagenomic approaches, the necessity for read-level taxonomy classifiers has increased. Although the 16S rRNA gene sequence has been widely employed as a taxonomic marker, recent studies have revealed that 16S rRNA is not sufficient to assign species. Therefore, an accurate classifier is required to classify whole-genome sequencing reads into species. With the advancement of deep learning methods and natural language processing technologies, several studies attempted to apply these methods to genomic data and successfully achieved state-of-the-art performance. In this study, we applied transformer- based embedding into bacterial genomes to accurately classify species using sequencing reads. As a case study, we classified Staphylococcus species using sequencing reads. Our model achieved ROC-AVC values of over 0.98 and 0.99 for 151bp and 251bp paired-end reads, respectively. Compared with a cutting-edge method Kraken2, our model classified significantly more S. aureus reads while maintaining comparable precision.
引用
收藏
页码:374 / 377
页数:4
相关论文
共 50 条
  • [1] NEIGHBOR-AUGMENTED TRANSFORMER-BASED EMBEDDING FOR RETRIEVAL
    Zhang, Jihai
    Lin, Fangquan
    Jiang, Wei
    Yang, Cheng
    Liu, Gaoge
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3893 - 3897
  • [2] ASTROMER A transformer-based embedding for the representation of light curves
    Donoso-Oliva, C.
    Becker, I.
    Protopapas, P.
    Cabrera-Vives, G.
    Vishnu, M.
    Vardhan, H.
    ASTRONOMY & ASTROPHYSICS, 2023, 670
  • [3] A Transformer-based Embedding Model for Personalized Product Search
    Bi, Keping
    Ai, Qingyao
    Croft, W. Bruce
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1521 - 1524
  • [4] TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance
    Tao, Yue
    Jia, Zhiwei
    Ma, Runze
    Xu, Shugong
    ELECTRONICS, 2021, 10 (22)
  • [5] Semanformer: Semantics-aware Embedding Dimensionality Reduction Using Transformer-Based Models
    Boyapati, Mallika
    Aygun, Ramazan
    18TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC 2024, 2024, : 134 - 141
  • [6] Enhancing Spam Message Classification and Detection Using Transformer-Based Embedding and Ensemble Learning
    Ghourabi, Abdallah
    Alohaly, Manar
    SENSORS, 2023, 23 (08)
  • [7] Generating Music Transition by Using a Transformer-Based Model
    Hsu, Jia-Lien
    Chang, Shuh-Jiun
    ELECTRONICS, 2021, 10 (18)
  • [8] Transformer-Based Word Embedding With CNN Model to Detect Sarcasm and Irony
    Ravinder Ahuja
    S. C. Sharma
    Arabian Journal for Science and Engineering, 2022, 47 : 9379 - 9392
  • [9] BaseNet: A transformer-based toolkit for nanopore sequencing signal decoding
    Li, Qingwen
    Sun, Chen
    Wang, Daqian
    Lou, Jizhong
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2024, 23 : 3430 - 3444
  • [10] Transformer-Based Word Embedding With CNN Model to Detect Sarcasm and Irony
    Ahuja, Ravinder
    Sharma, S. C.
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (08) : 9379 - 9392