Bioinformatics;
DNA/RNA sequences;
Natural language processing;
Nucleotide sequences;
Transformers;
METAGENOMIC DATA;
DNA METHYLATION;
SITES;
D O I:
10.1016/j.csbj.2025.03.024
中图分类号:
Q5 [生物化学];
Q7 [分子生物学];
学科分类号:
071010 ;
081704 ;
摘要:
Transformer-based language models are making an impact in the field of Natural Language Processing (NLP). As relevant parallels can be drawn between biological sequences and natural languages, the models used in NLP can be easily extended and adapted for applications in bioinformatics. This paper introduces the recent developments of Transformer-based models in the context of nucleotide sequences. We have reviewed and analysed a large number of application-based papers on this subject, giving evidence of the main characterizing features and to the different approaches that may be adopted to customize such powerful computational machines. Besides discussing what Transformers do and may do for the analysis of biological sequences, we also provide an overview of what Transformers are and why they work. We believe this review will help the scientific community in understanding the application of Transformer-based language models to nucleotide sequences, and that will motivate the readers to build on idea of Transformers as well as the discussed methodologies to tackle different problems in the field of bioinformatics.