A Historical Survey of Advances in Transformer Architectures

被引：1

作者：

Sajun, Ali Reza ^{[1
]}

Zualkernan, Imran ^{[1
]}

Sankalpa, Donthi ^{[1
]}

机构：

[1] Amer Univ Sharjah, Comp Sci & Engn Dept, POB 26666, Sharjah, U Arab Emirates

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 10期

关键词：

transformers; deep learning; generative deep learning; large language models; GPT; computer vision;

D O I：

10.3390/app14104316

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

In recent times, transformer-based deep learning models have risen in prominence in the field of machine learning for a variety of tasks such as computer vision and text generation. Given this increased interest, a historical outlook at the development and rapid progression of transformer-based models becomes imperative in order to gain an understanding of the rise of this key architecture. This paper presents a survey of key works related to the early development and implementation of transformer models in various domains such as generative deep learning and as backbones of large language models. Previous works are classified based on their historical approaches, followed by key works in the domain of text-based applications, image-based applications, and miscellaneous applications. A quantitative and qualitative analysis of the various approaches is presented. Additionally, recent directions of transformer-related research such as those in the biomedical and timeseries domains are discussed. Finally, future research opportunities, especially regarding the multi-modality and optimization of the transformer training process, are identified.

引用

页数：27

共 101 条

[1]

Achiam OJ, 2023, Arxiv, DOI [arXiv:2303.08774, 10.48550/arXiv.2303.08774, DOI 10.48550/ARXIV.2303.08774]

[2]

Ainslie J, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P268

[3]

[Anonymous], Attention Is All You Need Search Results

[4]

[Anonymous], 2022, ATTENTION IS ALL YOU

[5]

[Anonymous], 2013, INT C MACH LEARN PML, DOI DOI 10.48550/ARXIV.1211.5063

[6]

Ba L. J., 2016, arXiv

[7]

Beltagy I, 2020, Arxiv, DOI arXiv:2004.05150

[8]

Bojar O., 2014, P 9 WORKSHOP STAT MA, P12

[9]

BROWN T., 2020, P INT C NEUR INF PRO, V33, P1877, DOI DOI 10.48550/ARXIV.2005.14165

[10]

Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13

← 1 2 3 4 5 6 7 8 9 10 →