MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning

被引:20
作者
Xu, Xiaogang [1 ]
Zhao, Hengshuang [2 ,3 ]
Vineet, Vibhav [4 ]
Lim, Ser-Nam [5 ]
Torralba, Antonio [2 ]
机构
[1] CUHK, Hong Kong, Peoples R China
[2] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[3] HKU, Hong Kong, Peoples R China
[4] Microsoft Res, Redmond, WA USA
[5] Meta AI, New York, NY USA
来源
COMPUTER VISION - ECCV 2022, PT XXVII | 2022年 / 13687卷
关键词
Multi-task learning; Transformer; Cross-task reasoning;
D O I
10.1007/978-3-031-19812-0_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we explore the advantages of utilizing transformer structures for addressing multi-task learning (MTL). Specifically, we demonstrate that models with transformer structures are more appropriate for MTL than convolutional neural networks (CNNs), and we propose a novel transformer-based architecture named MTFormer for MTL. In the framework, multiple tasks share the same transformer encoder and transformer decoder, and lightweight branches are introduced to harvest task-specific outputs, which increases the MTL performance and reduces the time-space complexity. Furthermore, information from different task domains can benefit each other, and we conduct cross-task reasoning. We propose a cross-task attention mechanism for further boosting the MTL results. The cross-task attention mechanism brings little parameters and computations while introducing extra performance improvements. Besides, we design a self-supervised cross-task contrastive learning algorithm for further boosting the MTL performance. Extensive experiments are conducted on two multi-task learning datasets, on which MTFormer achieves state-of-the-art results with limited network parameters and computations. It also demonstrates significant superiorities for few-shot learning and zero-shot learning.
引用
收藏
页码:304 / 321
页数:18
相关论文
共 63 条
  • [11] Chu X., 2021, arXiv
  • [12] Crawshaw M., 2020, arXiv
  • [13] Multi-task Self-Supervised Visual Learning
    Doersch, Carl
    Zisserman, Andrew
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2070 - 2079
  • [14] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
  • [15] Everingham M., 2010, INT J COMPUT VISION, V88, P303, DOI DOI 10.1007/s11263-009-0275-4
  • [16] MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning
    Gao, Yuan
    Bai, Haoping
    Jie, Zequn
    Ma, Jiayi
    Jia, Kui
    Liu, Wei
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 11540 - 11549
  • [17] NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction
    Gao, Yuan
    Ma, Jiayi
    Zhao, Mingbo
    Liu, Wei
    Yuille, Alan L.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3200 - 3209
  • [18] Gool L.V., 2021, ICCV
  • [19] Han K, 2021, Arxiv, DOI [arXiv:2103.00112, DOI 10.48550/ARXIV.2103.00112]
  • [20] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778