Language Models Based on Deep Learning: A Review

被引:0
作者
Wang N.-Y. [1 ]
Ye Y.-X. [1 ,3 ]
Liu L. [2 ,3 ]
Feng L.-Z. [4 ]
Bao T. [1 ]
Peng T. [1 ,3 ]
机构
[1] College of Computer Science and Technology, Jilin University, Changchun
[2] College of Software, Jilin University, Changchun
[3] Key Laboratory of Symbol Computation and Knowledge Engineering for Ministry of Education, Jilin University, Changchun
[4] Department of Computer Science, University of Illinois at Chicago, Chicago, 60607, IL
来源
Peng, Tao (tpeng@jlu.edu.cn) | 1600年 / Chinese Academy of Sciences卷 / 32期
基金
中国国家自然科学基金;
关键词
Deep learning; Language model; Natural language processing; Neural language model; Pre-training;
D O I
10.13328/j.cnki.jos.006169
中图分类号
学科分类号
摘要
Language model, to express implicit knowledge of language, has been widely concerned as a basic problem of natural language processing in which the current research hotspot is the language model based on deep learning. Through pre-training and fine-tuning techniques, language models show their inherently power of representation, also improve the performance of downstream tasks greatly. Around the basic principles and different application directions, this study takes the neural probability language model and the pre-training language model as a pointcut for combining deep learning and natural language processing. The application as well as challenges of neural probability and pre-training model is introduced, which is based on the basic concepts and theories of language model. Then, the existing neural probability, pre-training language model include their methods are compared and analyzed. In addition, the training methods of pre-training language model are elaborated from two aspects of new training tasks and improved network structure. Meanwhile, the current research directions of pre-training model in scale compression, knowledge fusion, multi-modality, and cross-language are summarized and evaluated. Finally, the bottleneck of language model in natural language processing application is summed up, afterwards the possible future research priorities are prospected. © Copyright 2021, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:1082 / 1115
页数:33
相关论文
共 172 条
[91]  
Sun Y, Wang S, Li Y, Et al., Ernie 2.0: A continual pre-training framework for language understanding, (2019)
[92]  
Wang W, Bi B, Yan M, Et al., StructBERT: Incorporating language structures into pre-training for deep language understanding, (2019)
[93]  
Clark K, Luong MT, Le QV, Et al., ELECTRA: Pre-training text encoders as discriminators rather than generators, Proc. of the Int'l Conf. on Learning Representations, (2019)
[94]  
Goodfellow I, Pouget-Abadie J, Mirza M, Et al., Generative adversarial nets, Advances in Neural Information Processing Systems, pp. 2672-2680, (2014)
[95]  
Raffel C, Shazeer N, Roberts A, Et al., Exploring the limits of transfer learning with a unified text-to-text transformer, (2019)
[96]  
Lewis M, Liu Y, Goyal N, Et al., Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, (2019)
[97]  
Zhu Y, Kiros R, Zemel R, Et al., Aligning books and movies: Towards story-like visual explanations by watching movies and reading books, Proc. of the IEEE Int'l Conf. on Computer Vision, pp. 19-27, (2015)
[98]  
Parker R, Graff D, Kong J, Et al., English Gigaword fifth edition, linguistic data consortium, Google Scholar, (2011)
[99]  
Callan J, Hoy M, Yoo C, Et al., Clueweb09 data set, (2009)
[100]  
Cui Y, Liu T, Che W, Et al., A Span-extraction dataset for Chinese machine reading comprehension, Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int'l Joint Conf. on Natural Language Processing (EMNLP-IJCNLP), pp. 5886-5891, (2019)