Language Models Based on Deep Learning: A Review

被引:0
作者
Wang N.-Y. [1 ]
Ye Y.-X. [1 ,3 ]
Liu L. [2 ,3 ]
Feng L.-Z. [4 ]
Bao T. [1 ]
Peng T. [1 ,3 ]
机构
[1] College of Computer Science and Technology, Jilin University, Changchun
[2] College of Software, Jilin University, Changchun
[3] Key Laboratory of Symbol Computation and Knowledge Engineering for Ministry of Education, Jilin University, Changchun
[4] Department of Computer Science, University of Illinois at Chicago, Chicago, 60607, IL
来源
Peng, Tao (tpeng@jlu.edu.cn) | 1600年 / Chinese Academy of Sciences卷 / 32期
基金
中国国家自然科学基金;
关键词
Deep learning; Language model; Natural language processing; Neural language model; Pre-training;
D O I
10.13328/j.cnki.jos.006169
中图分类号
学科分类号
摘要
Language model, to express implicit knowledge of language, has been widely concerned as a basic problem of natural language processing in which the current research hotspot is the language model based on deep learning. Through pre-training and fine-tuning techniques, language models show their inherently power of representation, also improve the performance of downstream tasks greatly. Around the basic principles and different application directions, this study takes the neural probability language model and the pre-training language model as a pointcut for combining deep learning and natural language processing. The application as well as challenges of neural probability and pre-training model is introduced, which is based on the basic concepts and theories of language model. Then, the existing neural probability, pre-training language model include their methods are compared and analyzed. In addition, the training methods of pre-training language model are elaborated from two aspects of new training tasks and improved network structure. Meanwhile, the current research directions of pre-training model in scale compression, knowledge fusion, multi-modality, and cross-language are summarized and evaluated. Finally, the bottleneck of language model in natural language processing application is summed up, afterwards the possible future research priorities are prospected. © Copyright 2021, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:1082 / 1115
页数:33
相关论文
共 172 条
  • [1] LeCun Y, Bengio Y, Hinton G., Deep learning, Nature, 521, 7553, pp. 436-444, (2015)
  • [2] Durand T, Mehrasa N, Mori G., Learning a deep convnet for multi-label classification with partial labels, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 647-657, (2019)
  • [3] Li Y, Chen X, Zhu Z, Et al., Attention-guided unified network for panoptic segmentation, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 7026-7035, (2019)
  • [4] Wang Q, Li B, Xiao T, Et al., Learning deep transformer models for machine translation, Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1810-1822, (2019)
  • [5] Fu TJ, Li PH, Ma WY., GraphRel: Modeling text as relational graphs for joint entity and relation extraction, Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1409-1418, (2019)
  • [6] Yu T, Shen Y, Jin H., An visual dialog augmented interactive recommender system, Proc. of the 25th ACM SIGKDD Int'l Conf. on Knowledge Discovery & Data Mining, pp. 157-165, (2019)
  • [7] Xiao W, Zhao H, Pan H, Et al., Beyond personalization: Social content recommendation for creator equality and consumer satisfaction, Proc. of the 25th ACM SIGKDD Int'l Conf. on Knowledge Discovery & Data Mining, pp. 235-245, (2019)
  • [8] Lim SH, Xu H, Mannor S., Reinforcement learning in robust Markov decision processes, Advances in Neural Information Processing Systems, pp. 701-709, (2013)
  • [9] Mnih V, Kavukcuoglu K, Silver D, Et al., Human-level control through deep reinforcement learning, Nature, 518, 7540, (2015)
  • [10] Kim S, Dalmia S, Metze F., Gated embeddings in end-to-end speech recognition for conversational-context fusion, Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1131-1141, (2019)