Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey

被引：493

作者：

Min, Bonan ^{[1
]}

Ross, Hayley ^{[2
]}

Sulem, Elior ^{[3
]}

Ben Veyseh, Amir Pouran ^{[4
]}

Nguyen, Thien Huu ^{[5
]}

Sainz, Oscar ^{[6
]}

Agirre, Eneko ^{[6
]}

Heintz, Ilana ^{[7
]}

Roth, Dan ^{[8
]}

机构：

[1] Amazon AWS AI Labs, 2795 Augustine Dr, Santa Clara, CA 95054 USA

[2] Dept Linguist, Boylston Hall,3rd floor, Cambridge, MA 02138 USA

[3] Ben Gurion Univ Negev, Dept Software & Informat Syst Engn, Bldg 96,Room 207,Marcus Family Campus,POB 653, IL-8410501 Beer Sheva, Israel

[4] 1841 Garden Ave,Unit 213, Eugene, OR USA

[5] Univ Oregon, 330 Deschutes Hall,1477 E 13th Ave, Eugene, OR 97403 USA

[6] Univ Basque Country UPV EHU, Manuel Lardizabal 1, Donostia San Sebastian 20008, Spain

[7] Synopt Engn, 3030 Clarendon Blvd, Arlington, VA 22201 USA

[8] Univ Penn, 3330 Walnut St, Philadelphia, PA 19104 USA

来源：

ACM COMPUTING SURVEYS | 2024年 / 56卷 / 02期

关键词：

Large language models; foundational models; generative AI; neural networks;

D O I：

10.1145/3605943

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Large, pre-trained language models (PLMs) such as BERT and GPT have drastically changed the Natural Language Processing (NLP) field. For numerous NLP tasks, approaches leveraging PLMs have achieved state-of-the-art performance. The key idea is to learn a generic, latent representation of language from a generic task once, then share it across disparate NLP tasks. Language modeling serves as the generic task, one with abundant self-supervised text available for extensive training. This article presents the key fundamental concepts of PLM architectures and a comprehensive view of the shift to PLM-driven NLP techniques. It surveys work applying the pre-training then fine-tuning, prompting, and text generation approaches. In addition, it discusses PLM limitations and suggested directions for future research.

引用

页数：40

共 225 条

[1]

Abend Omri, 2013, P 51 ANN M ASS COMP, V1

[2]

Allen-Zhu Z, 2021, Arxiv, DOI [arXiv:2012.09816, DOI 10.48550/ARXIV.2012.09816]

[3]

Amrami A, 2019, Arxiv, DOI arXiv:1905.12598

[4]

Artetxe M, 2022, Arxiv, DOI [arXiv:2205.11726, DOI 10.18653/V1/2022.FINDINGS-EMNLP.293]

[5]

Athiwaratkun B, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P375

[6]

Bach SH, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, P93

[7]

Bacon Geoff., 2019, arXiv

[8]

Banarescu L., 2013, P 7 LING ANN WORKSH

[9]

Bandy Jack, 2021, arXiv

[10]

Bapna A, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P1538

← 1 2 3 4 5 6 7 8 9 10 →