CPM: A large-scale generative Chinese Pre-trained language model

被引：42

作者：

Zhang, Zhengyan ^{[1
]}

Han, Xu ^{[1
]}

Zhou, Hao ^{[1
]}

Ke, Pei ^{[1
]}

Gu, Yuxian ^{[1
]}

Ye, Deming ^{[1
]}

Qin, Yujia ^{[1
]}

Su, Yusheng ^{[1
]}

Ji, Haozhe ^{[1
]}

Guan, Jian ^{[1
]}

Qi, Fanchao ^{[1
]}

Wang, Xiaozhi ^{[1
]}

Zheng, Yanan ^{[1
]}

Zeng, Guoyang ^{[1
]}

Cao, Huanqi ^{[1
]}

Chen, Shengqi ^{[1
]}

Li, Daixuan ^{[1
]}

Sun, Zhenbo ^{[1
]}

Liu, Zhiyuan ^{[1
]}

Huang, Minlie ^{[1
]}

Han, Wentao ^{[1
]}

Tang, Jie ^{[1
]}

Li, Juanzi ^{[1
]}

Zhu, Xiaoyan ^{[1
]}

Sun, Maosong ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China

来源：

AI OPEN | 2021年 / 2卷

关键词：

Pre -trained language model; Zero -shot learning;

D O I：

10.1016/j.aiopen.2021.07.001

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre -trained Language Models (PLMs) have proven to be beneficial for various downstream NLP tasks. Recently, GPT-3, with 175 billion parameters and 570 GB training data, drew a lot of attention due to the capacity of fewshot (even zero -shot) learning. However, applying GPT-3 to address Chinese NLP tasks is still challenging, as the training corpus of GPT-3 is primarily English, and the parameters are not publicly available. In this technical report, we release the Chinese Pre -trained Language Model (CPM) with generative pre -training on large-scale Chinese training data. To the best of our knowledge, CPM, with 2.6 billion parameters and 100 GB Chinese training data, is the largest Chinese pre -trained language model, which could facilitate several downstream Chinese NLP tasks, such as conversation, essay generation, cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many NLP tasks in the settings of few -shot (even zero -shot) learning. The code and parameters are available at https://github.com/TsinghuaAI/CPM.

引用

页码：93 / 99

页数：7

共 41 条

[1]

Cui Y., 2020, Findings of EMNLP

[2]

Cui YM, 2021, Arxiv, DOI [arXiv:1906.08101, 10.48550/arXiv.1906.08101]

[3]

Cui YM, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P5883

[4]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[5]

Dodge J, 2020, Arxiv, DOI [arXiv:2002.06305, DOI 10.48550/ARXIV.2002.06305]

[6]

He W., 2018, P ACL

[7]

Hesse C., 2020, arXiv

[8]

Holtzman Ari, 2020, INT C LEARNING REPRE

[9]

Hu H, 2020, Arxiv, DOI arXiv:2010.05444

[10]

Jiao XQ, 2020, Arxiv, DOI arXiv:1909.10351

← 1 2 3 4 5 →