Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey

被引:324
作者
Min, Bonan [1 ]
Ross, Hayley [2 ]
Sulem, Elior [3 ]
Ben Veyseh, Amir Pouran [4 ]
Nguyen, Thien Huu [5 ]
Sainz, Oscar [6 ]
Agirre, Eneko [6 ]
Heintz, Ilana [7 ]
Roth, Dan [8 ]
机构
[1] Amazon AWS AI Labs, 2795 Augustine Dr, Santa Clara, CA 95054 USA
[2] Dept Linguist, Boylston Hall,3rd floor, Cambridge, MA 02138 USA
[3] Ben Gurion Univ Negev, Dept Software & Informat Syst Engn, Bldg 96,Room 207,Marcus Family Campus,POB 653, IL-8410501 Beer Sheva, Israel
[4] 1841 Garden Ave,Unit 213, Eugene, OR USA
[5] Univ Oregon, 330 Deschutes Hall,1477 E 13th Ave, Eugene, OR 97403 USA
[6] Univ Basque Country UPV EHU, Manuel Lardizabal 1, Donostia San Sebastian 20008, Spain
[7] Synopt Engn, 3030 Clarendon Blvd, Arlington, VA 22201 USA
[8] Univ Penn, 3330 Walnut St, Philadelphia, PA 19104 USA
关键词
Large language models; foundational models; generative AI; neural networks;
D O I
10.1145/3605943
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Large, pre-trained language models (PLMs) such as BERT and GPT have drastically changed the Natural Language Processing (NLP) field. For numerous NLP tasks, approaches leveraging PLMs have achieved state-of-the-art performance. The key idea is to learn a generic, latent representation of language from a generic task once, then share it across disparate NLP tasks. Language modeling serves as the generic task, one with abundant self-supervised text available for extensive training. This article presents the key fundamental concepts of PLM architectures and a comprehensive view of the shift to PLM-driven NLP techniques. It surveys work applying the pre-training then fine-tuning, prompting, and text generation approaches. In addition, it discusses PLM limitations and suggested directions for future research.
引用
收藏
页数:40
相关论文
共 225 条
  • [1] Abend Omri, 2013, P 51 ANN M ASS COMP, V1
  • [2] Alec Radford, 2018, Improving language understanding by generative pre-training
  • [3] Allen-Zhu Z, 2021, Arxiv, DOI arXiv:2012.09816
  • [4] Amrami A, 2019, Arxiv, DOI arXiv:1905.12598
  • [5] [Anonymous], 2001, Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval
  • [6] [Anonymous], 2015, EMNLP, DOI DOI 10.18653/V1/D15-1075
  • [7] [Anonymous], 2020, P 2014 C EMPIRICAL M
  • [8] Artetxe M, 2022, Arxiv, DOI arXiv:2205.11726
  • [9] Athiwaratkun B, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P375
  • [10] Bach S.H., 2022, arXiv