MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning

被引：20

作者：

Xu, Xiaogang ^{[1
]}

Zhao, Hengshuang ^{[2
,3
]}

Vineet, Vibhav ^{[4
]}

Lim, Ser-Nam ^{[5
]}

Torralba, Antonio ^{[2
]}

机构：

[1] CUHK, Hong Kong, Peoples R China

[2] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA

[3] HKU, Hong Kong, Peoples R China

[4] Microsoft Res, Redmond, WA USA

[5] Meta AI, New York, NY USA

来源：

COMPUTER VISION - ECCV 2022, PT XXVII | 2022年 / 13687卷

关键词：

Multi-task learning; Transformer; Cross-task reasoning;

D O I：

10.1007/978-3-031-19812-0_18

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we explore the advantages of utilizing transformer structures for addressing multi-task learning (MTL). Specifically, we demonstrate that models with transformer structures are more appropriate for MTL than convolutional neural networks (CNNs), and we propose a novel transformer-based architecture named MTFormer for MTL. In the framework, multiple tasks share the same transformer encoder and transformer decoder, and lightweight branches are introduced to harvest task-specific outputs, which increases the MTL performance and reduces the time-space complexity. Furthermore, information from different task domains can benefit each other, and we conduct cross-task reasoning. We propose a cross-task attention mechanism for further boosting the MTL results. The cross-task attention mechanism brings little parameters and computations while introducing extra performance improvements. Besides, we design a self-supervised cross-task contrastive learning algorithm for further boosting the MTL performance. Extensive experiments are conducted on two multi-task learning datasets, on which MTFormer achieves state-of-the-art results with limited network parameters and computations. It also demonstrates significant superiorities for few-shot learning and zero-shot learning.

引用

页码：304 / 321

页数：18

共 63 条

[11] Chu X., 2021, arXiv
[12] Crawshaw M., 2020, arXiv
[13] Multi-task Self-Supervised Visual Learning
Doersch, Carl
Zisserman, Andrew
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2070 - 2079
[14] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[15] Everingham M., 2010, INT J COMPUT VISION, V88, P303, DOI DOI 10.1007/s11263-009-0275-4
[16] MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning
Gao, Yuan
Bai, Haoping
Jie, Zequn
Ma, Jiayi
Jia, Kui
Liu, Wei
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 11540 - 11549
[17] NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction
Gao, Yuan
Ma, Jiayi
Zhao, Mingbo
Liu, Wei
Yuille, Alan L.
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3200 - 3209
[18] Gool L.V., 2021, ICCV
[19] Han K, 2021, Arxiv, DOI [arXiv:2103.00112, DOI 10.48550/ARXIV.2103.00112]
[20] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778

← 1 2 3 4 5 6 7 →