MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning

被引:20
作者
Xu, Xiaogang [1 ]
Zhao, Hengshuang [2 ,3 ]
Vineet, Vibhav [4 ]
Lim, Ser-Nam [5 ]
Torralba, Antonio [2 ]
机构
[1] CUHK, Hong Kong, Peoples R China
[2] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[3] HKU, Hong Kong, Peoples R China
[4] Microsoft Res, Redmond, WA USA
[5] Meta AI, New York, NY USA
来源
COMPUTER VISION - ECCV 2022, PT XXVII | 2022年 / 13687卷
关键词
Multi-task learning; Transformer; Cross-task reasoning;
D O I
10.1007/978-3-031-19812-0_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we explore the advantages of utilizing transformer structures for addressing multi-task learning (MTL). Specifically, we demonstrate that models with transformer structures are more appropriate for MTL than convolutional neural networks (CNNs), and we propose a novel transformer-based architecture named MTFormer for MTL. In the framework, multiple tasks share the same transformer encoder and transformer decoder, and lightweight branches are introduced to harvest task-specific outputs, which increases the MTL performance and reduces the time-space complexity. Furthermore, information from different task domains can benefit each other, and we conduct cross-task reasoning. We propose a cross-task attention mechanism for further boosting the MTL results. The cross-task attention mechanism brings little parameters and computations while introducing extra performance improvements. Besides, we design a self-supervised cross-task contrastive learning algorithm for further boosting the MTL performance. Extensive experiments are conducted on two multi-task learning datasets, on which MTFormer achieves state-of-the-art results with limited network parameters and computations. It also demonstrates significant superiorities for few-shot learning and zero-shot learning.
引用
收藏
页码:304 / 321
页数:18
相关论文
共 63 条
  • [1] [Anonymous], 2022, IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2021.3054719
  • [2] Bansal A, 2017, Arxiv, DOI [arXiv:1702.06506, DOI 10.48550/ARXIV.1702.06506]
  • [3] Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels
    Bragman, Felix J. S.
    Tanno, Ryutaro
    Ourselin, Sebastien
    Alexander, Daniel C.
    Cardoso, M. Jorge
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1385 - 1394
  • [4] Bruggemann D., 2020, BMVC
  • [5] Carion N., 2020, EUROPEAN C COMPUTER, V12346, P213, DOI 10.1007/978-3-030-58452-8_13
  • [6] Pre-Trained Image Processing Transformer
    Chen, Hanting
    Wang, Yunhe
    Guo, Tianyu
    Xu, Chang
    Deng, Yiping
    Liu, Zhenhua
    Ma, Siwei
    Xu, Chunjing
    Xu, Chao
    Gao, Wen
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12294 - 12305
  • [7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [9] Predicting Multiple Attributes via Relative Multi-task Learning
    Chen, Lin
    Zhang, Qiang
    Li, Baoxin
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1027 - 1034
  • [10] Chen YP, 2017, Arxiv, DOI arXiv:1707.01629