A Systematic Review of Transformer-Based Pre-Trained Language Models through Self-Supervised Learning

被引:32
作者
Kotei, Evans [1 ]
Thirunavukarasu, Ramkumar [1 ]
机构
[1] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore 632014, India
关键词
transformer network; transfer learning; pretraining; natural language processing; language models; BERT;
D O I
10.3390/info14030187
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transfer learning is a technique utilized in deep learning applications to transmit learned inference to a different target domain. The approach is mainly to solve the problem of a few training datasets resulting in model overfitting, which affects model performance. The study was carried out on publications retrieved from various digital libraries such as SCOPUS, ScienceDirect, IEEE Xplore, ACM Digital Library, and Google Scholar, which formed the Primary studies. Secondary studies were retrieved from Primary articles using the backward and forward snowballing approach. Based on set inclusion and exclusion parameters, relevant publications were selected for review. The study focused on transfer learning pretrained NLP models based on the deep transformer network. BERT and GPT were the two elite pretrained models trained to classify global and local representations based on larger unlabeled text datasets through self-supervised learning. Pretrained transformer models offer numerous advantages to natural language processing models, such as knowledge transfer to downstream tasks that deal with drawbacks associated with training a model from scratch. This review gives a comprehensive view of transformer architecture, self-supervised learning and pretraining concepts in language models, and their adaptation to downstream tasks. Finally, we present future directions to further improvement in pretrained transformer-based language models.
引用
收藏
页数:25
相关论文
共 126 条
[1]  
Alsentzer Emily., 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop, P72, DOI [10.18653/v1/W19-1909, DOI 10.18653/V1/W19-1909]
[2]  
[Anonymous], 2017, P 2017 C EMP METH NA, DOI [10.18653/v1/D17-1151, DOI 10.18653/V1/D17-1151]
[3]   Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks [J].
Apostolopoulos, Ioannis D. ;
Mpesiana, Tzani A. .
PHYSICAL AND ENGINEERING SCIENCES IN MEDICINE, 2020, 43 (02) :635-640
[4]  
Badampudi Deepika., 2015, Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering, P17, DOI [DOI 10.1145/2745802.2745818, DOI 10.1145/2745802]
[5]  
Baevski A., 2020, Advances in neural information processing systems, V33, P12449
[6]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[7]   VoxelMorph: A Learning Framework for Deformable Medical Image Registration [J].
Balakrishnan, Guha ;
Zhao, Amy ;
Sabuncu, Mert R. ;
Guttag, John ;
Dalca, Adrian, V .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (08) :1788-1800
[8]  
Beltagy I., 2019, arXiv
[9]  
Bowman S.R., 2015, P C EMP METH NAT LAN
[10]  
Cahyawijaya S, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P8875