A review on the applications of Transformer-based language models for nucleotide sequence analysis

被引:0
作者
Ghosh, Nimisha [1 ]
Santoni, Daniele [2 ]
Saha, Indrajit [3 ]
Felici, Giovanni [2 ]
机构
[1] Shiv Nadar Univ, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
[2] Natl Res Council Italy, Inst Syst Anal & Comp Sci Antonio Ruberti, Rome, Italy
[3] Natl Inst Tech Teachers Training & Res, Dept Comp Sci & Engn, Kolkata, W Bengal, India
关键词
Bioinformatics; DNA/RNA sequences; Natural language processing; Nucleotide sequences; Transformers; METAGENOMIC DATA; DNA METHYLATION; SITES;
D O I
10.1016/j.csbj.2025.03.024
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Transformer-based language models are making an impact in the field of Natural Language Processing (NLP). As relevant parallels can be drawn between biological sequences and natural languages, the models used in NLP can be easily extended and adapted for applications in bioinformatics. This paper introduces the recent developments of Transformer-based models in the context of nucleotide sequences. We have reviewed and analysed a large number of application-based papers on this subject, giving evidence of the main characterizing features and to the different approaches that may be adopted to customize such powerful computational machines. Besides discussing what Transformers do and may do for the analysis of biological sequences, we also provide an overview of what Transformers are and why they work. We believe this review will help the scientific community in understanding the application of Transformer-based language models to nucleotide sequences, and that will motivate the readers to build on idea of Transformers as well as the discussed methodologies to tackle different problems in the field of bioinformatics.
引用
收藏
页码:1244 / 1254
页数:11
相关论文
empty
未找到相关数据