Empirical evaluation of multi-task learning in deep neural networks for natural language processing

被引:0
作者
Jianquan Li
Xiaokang Liu
Wenpeng Yin
Min Yang
Liqun Ma
Yaohong Jin
机构
[1] Beijing Ultrapower Software Co.,Department of Computer and Information Science
[2] Ltd.,Shenzhen Key Laboratory for High Performance Data Mining
[3] University of Pennsylvania,undefined
[4] Shenzhen Institutes of Advanced Technology,undefined
[5] Chinese Academy of Sciences,undefined
来源
Neural Computing and Applications | 2021年 / 33卷
关键词
Natural language processing; Multi-task learning; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Multi-task learning (MTL) aims at boosting the overall performance of each individual task by leveraging useful information contained in multiple-related tasks. It has shown great success in natural language processing (NLP). Currently, a number of MTL architectures and learning mechanisms have been proposed for various NLP tasks, including exploring linguistic hierarchies, orthogonality constraints, adversarial learning, gate mechanism, and label embedding. However, there is no systematic exploration and comparison of different MTL architectures and learning mechanisms for their strong performance in-depth. In this paper, we conduct a thorough examination of five typical MTL methods with deep learning architectures for a broad range of representative NLP tasks. Our primary goal is to understand the merits and demerits of existing MTL methods in NLP tasks, thus devising new hybrid architectures intended to combine their strengths. Following the empirical evaluation, we offer our insights and conclusions regarding the MTL methods we have considered.
引用
收藏
页码:4417 / 4428
页数:11
相关论文
共 9 条
  • [1] Hochreiter S(1997)Long short-term memory Neural Comput 9 1735-1780
  • [2] Schmidhuber J(1993)Building a large annotated corpus of English: the penn treebank Comput Linguist 19 313-330
  • [3] Marcus MP(1975)Comparison of the predicted and observed secondary structure of t4 phage lysozyme Biochim Biophys Acta Protein Struct 405 442-451
  • [4] Marcinkiewicz MA(2019)Neural network acceptability judgments Trans Assoc Comput Linguist 7 625-641
  • [5] Santorini B(undefined)undefined undefined undefined undefined-undefined
  • [6] Matthews BW(undefined)undefined undefined undefined undefined-undefined
  • [7] Warstadt A(undefined)undefined undefined undefined undefined-undefined
  • [8] Singh A(undefined)undefined undefined undefined undefined-undefined
  • [9] Bowman SR(undefined)undefined undefined undefined undefined-undefined