Deep multi-task learning with flexible and compact architecture search

被引:6
作者
Zhao, Jiejie [1 ,2 ]
Lv, Weifeng [1 ,2 ]
Du, Bowen [1 ,2 ]
Ye, Junchen [1 ,2 ]
Sun, Leilei [1 ,2 ]
Xiong, Guixi [1 ,2 ]
机构
[1] Beihang Univ, SKLSDE Lab, Beijing 100191, Peoples R China
[2] Beihang Univ, BDBC Lab, Beijing 100191, Peoples R China
基金
国家重点研发计划; 美国国家科学基金会;
关键词
Multi-task learning; Network architecture; Feedforward neural networks; Parameter generation; Task relationship;
D O I
10.1007/s41060-021-00274-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-task learning has been applied successfully in various applications. Recent research shows that the performance of multi-task learning methods could be improved by appropriately sharing model architectures. However, the existing work either identifies multi-task architecture manually based on prior knowledge, or simply uses an identical model structure for all tasks with a parameter sharing mechanism. In this paper, we propose a novel architecture search method to discover flexible and compact architectures for deep multi-task learning automatically, which not only extends the expressiveness of existing reinforcement learning-based neural architecture search methods, but also enhances the flexibility of existing hand-crafted multi-task learning methods. The discovered architecture shares structure and parameters adaptively to handle different levels of task relatedness, resulting in effectiveness improvement. In particular, for deep multi-task learning, we propose an architecture search space which includes a combination of partially shared modules at the low-level layer, and a set of task-specific modules with various depths at high-level layers. Secondly, a parameter generation mechanism is proposed to not only explore all possible cross-layer connections, but also reduce the search cost. Thirdly, we propose a task-specific shadow batch normalization mechanism to stabilize the training process and improve the search effectiveness. Finally, an auxiliary module is designed to guide the model training process. Experimental results demonstrate that the learned architectures outperform state-of-the-art methods with fewer learning parameters.
引用
收藏
页码:187 / 199
页数:13
相关论文
共 37 条
[1]  
[Anonymous], 2019, AAAI
[2]   Convex multi-task feature learning [J].
Argyriou, Andreas ;
Evgeniou, Theodoros ;
Pontil, Massimiliano .
MACHINE LEARNING, 2008, 73 (03) :243-272
[3]   Multitask learning [J].
Caruana, R .
MACHINE LEARNING, 1997, 28 (01) :41-75
[4]   Multi-Task Learning in Deep Neural Networks for Mandarin-English Code-Mixing Speech Recognition [J].
Chen, Mengzhe ;
Pan, Jielin ;
Zhao, Qingwei ;
Yan, Yonghong .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10) :2554-2557
[5]  
Chen Yuxiao, 2019, ARXIV191210730PHYSIC
[6]   Learning to Transfer: Generalizable Attribute Learning with Multitask Neural Model Search [J].
Cheng, Zhi-Qi ;
Wu, Xiao ;
Huang, Siyu ;
Li, Jun-Xiu ;
Hauptmann, Alexander G. ;
Peng, Qiang .
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :90-98
[7]  
Chu X., 2020, ABS200105887 CORR
[8]  
Evgeniou T., 2004, P 10 ACM SIGKDD INT, P109
[9]  
Gao Y., 2020, IEEE C COMP VIS PATT
[10]   Cross-domain Image Retrieval with a Dual Attribute-aware Ranking Network [J].
Huang, Junshi ;
Feris, Rogerio ;
Chen, Qiang ;
Yan, Shuicheng .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1062-1070