Applications of transformer-based language models in bioinformatics: a survey

被引:52
|
作者
Zhang, Shuang [1 ]
Fan, Rui [1 ]
Liu, Yuti [1 ]
Chen, Shuang [1 ]
Liu, Qiao
Zeng, Wanwen [1 ,2 ]
机构
[1] Nankai Univ, Coll Software, Tianjin 300350, Peoples R China
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
基金
中国国家自然科学基金;
关键词
GENE-EXPRESSION DATA; PROTEINS;
D O I
10.1093/bioadv/vbad001
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
The transformer-based language models, including vanilla transformer, BERT and GPT-3, have achieved revolutionary breakthroughs in the field of natural language processing (NLP). Since there are inherent similarities between various biological sequences and natural languages, the remarkable interpretability and adaptability of these models have prompted a new wave of their application in bioinformatics research. To provide a timely and comprehensive review, we introduce key developments of transformer-based language models by describing the detailed structure of transformers and summarize their contribution to a wide range of bioinformatics research from basic sequence analysis to drug discovery. While transformer-based applications in bioinformatics are diverse and multifaceted, we identify and discuss the common challenges, including heterogeneity of training data, computational expense and model interpretability, and opportunities in the context of bioinformatics research. We hope that the broader community of NLP researchers, bioinformaticians and biologists will be brought together to foster future research and development in transformer-based language models, and inspire novel bioinformatics applications that are unattainable by traditional methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] AMMU: A survey of transformer-based biomedical pretrained language models
    Kalyan, Katikapalli Subramanyam
    Rajasekharan, Ajit
    Sangeetha, Sivanesan
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 126
  • [2] Transformer-based language models for mental health issues: A survey
    Greco, Candida M.
    Simeri, Andrea
    Tagarelli, Andrea
    Zumpano, Ester
    PATTERN RECOGNITION LETTERS, 2023, 167 : 204 - 211
  • [3] Transformer-based Language Models for Semantic Search and Mobile Applications Retrieval
    Coelho, Joao
    Neto, Antonio
    Tavares, Miguel
    Coutinho, Carlos
    Oliveira, Joao
    Ribeiro, Ricardo
    Batista, Fernando
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1:, 2021, : 225 - 232
  • [4] Ouroboros: On Accelerating Training of Transformer-Based Language Models
    Yang, Qian
    Huo, Zhouyuan
    Wang, Wenlin
    Huang, Heng
    Carin, Lawrence
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] Transformer-Based Language Models for Software Vulnerability Detection
    Thapa, Chandra
    Jang, Seung Ick
    Ahmed, Muhammad Ejaz
    Camtepe, Seyit
    Pieprzyk, Josef
    Nepal, Surya
    PROCEEDINGS OF THE 38TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2022, 2022, : 481 - 496
  • [6] A Comparison of Transformer-Based Language Models on NLP Benchmarks
    Greco, Candida Maria
    Tagarelli, Andrea
    Zumpano, Ester
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 490 - 501
  • [7] RadBERT: Adapting Transformer-based Language Models to Radiology
    Yan, An
    McAuley, Julian
    Lu, Xing
    Du, Jiang
    Chang, Eric Y.
    Gentili, Amilcare
    Hsu, Chun-Nan
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2022, 4 (04)
  • [8] TAG: Gradient Attack on Transformer-based Language Models
    Deng, Jieren
    Wang, Yijue
    Li, Ji
    Wang, Chenghong
    Shang, Chao
    Liu, Hang
    Rajasekaran, Sanguthevar
    Ding, Caiwen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3600 - 3610
  • [9] Semantics of Multiword Expressions in Transformer-Based Models: A Survey
    Miletic, Filip
    Walde, Sabine Schulte Im
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 593 - 612
  • [10] Transformer-Based Single-Cell Language Model: A Survey
    Lan, Wei
    He, Guohang
    Liu, Mingyang
    Chen, Qingfeng
    Cao, Junyue
    Peng, Wei
    BIG DATA MINING AND ANALYTICS, 2024, 7 (04): : 1169 - 1186