Prompt Guided Transformer for Multi-Task Dense Prediction

被引:6
|
作者
Lu, Yuxiang [1 ]
Sirejiding, Shalayiding [1 ]
Ding, Yue [1 ]
Wang, Chunlin [2 ]
Lu, Hongtao [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[2] Chuxiong Normal Univ, Sch Informat Sci & Technol, Chuxiong 675099, Peoples R China
关键词
Multi-task learning; dense prediction; prompting; vision transformer;
D O I
10.1109/TMM.2024.3349865
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Task-conditional architecture offers advantage in parameter efficiency but falls short in performance compared to state-of-the-art multi-decoder methods. How to trade off performance and model parameters is an important and difficult problem. In this paper, we introduce a simple and lightweight task-conditional model called Prompt Guided Transformer (PGT) to optimize this challenge. Our approach designs a Prompt-conditioned Transformer block, which incorporates task-specific prompts in the self-attention mechanism to achieve global dependency modeling and parameter-efficient feature adaptation across multiple tasks. This block is integrated into both the shared encoder and decoder, enhancing the capture of intra- and inter-task features. Moreover, we design a lightweight decoder to further reduce parameter usage, which accounts for only 2.7% of the total model parameters. Extensive experiments on two multi-task dense prediction benchmarks, PASCAL-Context and NYUD-v2, demonstrate that our approach achieves state-of-the-art results among task-conditional methods while using fewer parameters, and maintains a significant balance between performance and parameter size.
引用
收藏
页码:6375 / 6385
页数:11
相关论文
共 50 条
  • [1] Multi-Task Learning With Multi-Query Transformer for Dense Prediction
    Xu, Yangyang
    Li, Xiangtai
    Yuan, Haobo
    Yang, Yibo
    Zhang, Lefei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 1228 - 1240
  • [2] DeMT: Deformable Mixer Transformer for Multi-Task Learning of Dense Prediction
    Xu, Yangyang
    Yang, Yibo
    Zhang, Lefei
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3072 - 3080
  • [3] TFUT: Task fusion upward transformer model for multi-task learning on dense prediction
    Xin, Zewei
    Sirejiding, Shalayiding
    Lu, Yuxiang
    Ding, Yue
    Wang, Chunlin
    Alsarhan, Tamam
    Lu, Hongtao
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 244
  • [4] Contrastive Multi-Task Dense Prediction
    Yang, Siwei
    Ye, Hanrong
    Xu, Dan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3190 - 3197
  • [5] MTLFormer: Multi-Task Learning Guided Transformer Network for Business Process Prediction
    Wang, Jiaojiao
    Huang, Jiawei
    Ma, Xiaoyu
    Li, Zhongjin
    Wang, Yaqi
    Yu, Dingguo
    IEEE ACCESS, 2023, 11 : 76722 - 76738
  • [6] Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
    Ye, Hanrong
    Xu, Dan
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 514 - 530
  • [7] Multi-Task Learning with Knowledge Distillation for Dense Prediction
    Xu, Yangyang
    Yang, Yibo
    Zhang, Lefei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21493 - 21502
  • [8] Multi-Task Learning for Dense Prediction Tasks: A Survey
    Vandenhende, Simon
    Georgoulis, Stamatios
    Van Gansbeke, Wouter
    Proesmans, Marc
    Dai, Dengxin
    Van Gool, Luc
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3614 - 3633
  • [9] Exploring Relational Context for Multi-Task Dense Prediction
    Bruggemann, David
    Kanakis, Menelaos
    Obukhov, Anton
    Georgoulis, Stamatios
    Van Gool, Luc
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15849 - 15858
  • [10] Multi-Task Transformer Visualization to build Trust for Clinical Outcome Prediction
    Antweiler, Dario
    Gallusser, Florian
    Fuchs, Georg
    2023 WORKSHOP ON VISUAL ANALYTICS IN HEALTHCARE, VAHC, 2023, : 21 - 26