MSSM: A Multiple-level Sparse Sharing Model for Efficient Multi-Task Learning

被引：33

作者：

Ding, Ke ^{[1
]}

Dong, Xin ^{[1
]}

He, Yong ^{[2
]}

Cheng, Lei ^{[2
]}

Fu, Chilin ^{[2
]}

Huan, Zhaoxin ^{[2
]}

Li, Hai ^{[1
]}

Yan, Tan ^{[1
]}

Zhang, Liang ^{[2
]}

Zhang, Xiaolu ^{[2
]}

Mo, Linjian ^{[1
]}

机构：

[1] Ant Grp, Shanghai, Peoples R China

[2] Ant Grp, Hangzhou, Peoples R China

来源：

SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2021年

关键词：

Multi-task Learning; sparse connection; parameter sharing;

D O I：

10.1145/3404835.3463022

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-task learning(MTL) is an open and challenging problem in various real-world applications. The typical way of conducting multi-task learning is establishing some global parameter sharing mechanism across all tasks or assigning each task an individual set of parameters with cross-connections between tasks. However, for most existing approaches, all tasks just thoroughly or proportionally share all the features without distinguishing the helpfulness of them. By that, some tasks would be intervened by the unhelpful features that are useful for other tasks, leading to undesired negative transfer between tasks. In this paper, we design a novel architecture named the Multiple-level Sparse Sharing Model (MSSM), which can learn features selectively and share knowledge across all tasks efficiently. MSSM first employs a field-level sparse connection module (FSCM) to enable much more expressive combinations of feature fields to be learned for generalization across tasks while still allowing for task-specific features to be customized for each task. Furthermore, a cell-level sparse sharing module (CSSM) can recognize the sharing pattern through a set of coding variables that selectively choose which cells to route for a given task.Extensive experimental results on several real-world datasets show that MSSM outperforms SOTA models significantly in terms of AUC and LogLoss metrics.

引用

页码：2237 / 2241

页数：5

共 19 条

[1] Multitask learning [J].

Caruana, R .

MACHINE LEARNING, 1997, 28 (01) :41-75

[2]

Chen Xinlei, 2020, AUTOPHAGY, DOI DOI 10.1080/15548627.2020.1810918

[3]

Chennupati S., 2019, P IEEE C COMP VIS PA

[4]

Collobert R, 2008, P ICML, P160

[5] An introduction to ROC analysis [J].

Fawcett, Tom .

PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874

[6]

Gao Yuan, 2020, PROC IEEECVF C COMPU, P11543

[7]

HongWen Jing Zhang, 2020, P 43 INT ACM SIGIR C, P2377, DOI DOI 10.1145/3397271.3401443

[8]

Kingma DP, 2014, ADV NEUR IN, V27

[9] End-to-End Multi-Task Learning with Attention [J].

Liu, Shikun ;

Johns, Edward ;

Davison, Andrew J. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1871-1880

[10]

Liu X, 2015, Advances of

← 1 2 →