Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey

被引:493
作者
Min, Bonan [1 ]
Ross, Hayley [2 ]
Sulem, Elior [3 ]
Ben Veyseh, Amir Pouran [4 ]
Nguyen, Thien Huu [5 ]
Sainz, Oscar [6 ]
Agirre, Eneko [6 ]
Heintz, Ilana [7 ]
Roth, Dan [8 ]
机构
[1] Amazon AWS AI Labs, 2795 Augustine Dr, Santa Clara, CA 95054 USA
[2] Dept Linguist, Boylston Hall,3rd floor, Cambridge, MA 02138 USA
[3] Ben Gurion Univ Negev, Dept Software & Informat Syst Engn, Bldg 96,Room 207,Marcus Family Campus,POB 653, IL-8410501 Beer Sheva, Israel
[4] 1841 Garden Ave,Unit 213, Eugene, OR USA
[5] Univ Oregon, 330 Deschutes Hall,1477 E 13th Ave, Eugene, OR 97403 USA
[6] Univ Basque Country UPV EHU, Manuel Lardizabal 1, Donostia San Sebastian 20008, Spain
[7] Synopt Engn, 3030 Clarendon Blvd, Arlington, VA 22201 USA
[8] Univ Penn, 3330 Walnut St, Philadelphia, PA 19104 USA
关键词
Large language models; foundational models; generative AI; neural networks;
D O I
10.1145/3605943
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Large, pre-trained language models (PLMs) such as BERT and GPT have drastically changed the Natural Language Processing (NLP) field. For numerous NLP tasks, approaches leveraging PLMs have achieved state-of-the-art performance. The key idea is to learn a generic, latent representation of language from a generic task once, then share it across disparate NLP tasks. Language modeling serves as the generic task, one with abundant self-supervised text available for extensive training. This article presents the key fundamental concepts of PLM architectures and a comprehensive view of the shift to PLM-driven NLP techniques. It surveys work applying the pre-training then fine-tuning, prompting, and text generation approaches. In addition, it discusses PLM limitations and suggested directions for future research.
引用
收藏
页数:40
相关论文
共 225 条
[1]  
Abend Omri, 2013, P 51 ANN M ASS COMP, V1
[2]  
Allen-Zhu Z, 2021, Arxiv, DOI [arXiv:2012.09816, DOI 10.48550/ARXIV.2012.09816]
[3]  
Amrami A, 2019, Arxiv, DOI arXiv:1905.12598
[4]  
Artetxe M, 2022, Arxiv, DOI [arXiv:2205.11726, DOI 10.18653/V1/2022.FINDINGS-EMNLP.293]
[5]  
Athiwaratkun B, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P375
[6]  
Bach SH, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, P93
[7]  
Bacon Geoff., 2019, arXiv
[8]  
Banarescu L., 2013, P 7 LING ANN WORKSH
[9]  
Bandy Jack, 2021, arXiv
[10]  
Bapna A, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P1538