Prompt Guided Transformer for Multi-Task Dense Prediction

被引:6
|
作者
Lu, Yuxiang [1 ]
Sirejiding, Shalayiding [1 ]
Ding, Yue [1 ]
Wang, Chunlin [2 ]
Lu, Hongtao [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[2] Chuxiong Normal Univ, Sch Informat Sci & Technol, Chuxiong 675099, Peoples R China
关键词
Multi-task learning; dense prediction; prompting; vision transformer;
D O I
10.1109/TMM.2024.3349865
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Task-conditional architecture offers advantage in parameter efficiency but falls short in performance compared to state-of-the-art multi-decoder methods. How to trade off performance and model parameters is an important and difficult problem. In this paper, we introduce a simple and lightweight task-conditional model called Prompt Guided Transformer (PGT) to optimize this challenge. Our approach designs a Prompt-conditioned Transformer block, which incorporates task-specific prompts in the self-attention mechanism to achieve global dependency modeling and parameter-efficient feature adaptation across multiple tasks. This block is integrated into both the shared encoder and decoder, enhancing the capture of intra- and inter-task features. Moreover, we design a lightweight decoder to further reduce parameter usage, which accounts for only 2.7% of the total model parameters. Extensive experiments on two multi-task dense prediction benchmarks, PASCAL-Context and NYUD-v2, demonstrate that our approach achieves state-of-the-art results among task-conditional methods while using fewer parameters, and maintains a significant balance between performance and parameter size.
引用
收藏
页码:6375 / 6385
页数:11
相关论文
共 50 条
  • [31] PAG-Unet: multi-task dense scene understanding with pixel-attention-guided Unet
    Xu, Yi
    Li, Changhao
    APPLIED INTELLIGENCE, 2025, 55 (06)
  • [32] Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer
    Lee, Namkyeong
    Noh, Heewoong
    Kim, Sungwon
    Hyun, Dongmin
    Na, Gyoung S.
    Park, Chanyoung
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [33] Informed Truthfulness in Multi-Task Peer Prediction
    Shnayder, Victor
    Agarwal, Arpit
    Frongillo, Rafael
    Parkes, David C.
    EC'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON ECONOMICS AND COMPUTATION, 2016, : 179 - 196
  • [34] Multi-Task CNN Model for Attribute Prediction
    Abdulnabi, Abrar H.
    Wang, Gang
    Lu, Jiwen
    Jia, Kui
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1949 - 1959
  • [35] MultiEmo: Multi-task framework for emoji prediction
    Lee, SangEun
    Jeong, Dahye
    Park, Eunil
    KNOWLEDGE-BASED SYSTEMS, 2022, 242
  • [36] Hierarchical Prompt Tuning for Few-Shot Multi-Task Learning
    Liu, Jingping
    Chen, Tao
    Liang, Zujie
    Jiang, Haiyun
    Xiao, Yanghua
    Wei, Feng
    Qian, Yuxi
    Hao, Zhenghong
    Han, Bing
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1556 - 1565
  • [37] MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning
    Xu, Xiaogang
    Zhao, Hengshuang
    Vineet, Vibhav
    Lim, Ser-Nam
    Torralba, Antonio
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 304 - 321
  • [38] Guided Learning: A New Paradigm for Multi-task Classification
    Fu, Jingru
    Zhang, Lei
    Zhang, Bob
    Jia, Wei
    BIOMETRIC RECOGNITION, CCBR 2018, 2018, 10996 : 239 - 246
  • [39] PERCEIVER- ACTOR: A Multi-Task Transformer for Robotic Manipulation
    Shridhar, Mohit
    Manuelli, Lucas
    Fox, Dieter
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 785 - 799
  • [40] Multi-Task Transformer with LSTM Model for Question Set Generation
    1600, Institute of Electrical and Electronics Engineers Inc.