PAG-Unet: multi-task dense scene understanding with pixel-attention-guided Unet

被引:0
|
作者
Xu, Yi [1 ,2 ]
Li, Changhao [1 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, 111 Kowloon Rd, Hefei 230601, Anhui, Peoples R China
[2] Anhui Univ, SKey Lab Intelligent Comp & Signal Proc, Minist Educ, 111 Kowloon Rd, Hefei 230601, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Muti-task dense scene understanding; Pixel-attention-guided Unet; Feature enhancement; Task interaction;
D O I
10.1007/s10489-025-06389-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-task dense scene understanding is a fundamental research area in computer vision (CV). By predicting pixels, perceiving, and reasoning about multiple related tasks, it improves both accuracy and data efficiency. However, it faces the challenge that some tasks may require more independent feature representations, and excessive sharing can lead to interference between tasks. To address this issue, we propose a novel Pixel-Attention-Guided Unet (PAG-Unet). PAG-Unet incorporates a Pixel-Attention-Guided Fusion module (PAG Fusion) and a Multi-Task Self-Attention module (MTSA) to enhance task-specific feature extraction and reduce task interference. PAG Fusion leverages the relationship between shallow and deep features by using task-specific deep features to calibrate the distribution of shared shallow features. This suppresses background noise and enhances semantic features, thereby fully extracting task-specific features for different tasks and achieving feature enhancement. MTSA considers both global and local spatial interactions for each task during task interactions, capturing task-specific information and compensating for the loss of crucial details, thus improving prediction accuracy for each task. Our method achieves superior multi-task performance on the New York University Depth v2(NYUD-v2) and PASCAL Visual Object Classes Context(PASCAL-Context) datasets, with most metrics significantly outperforming previous state-of-the-art methods. The code is available at https://github.com/UPLI-123/Pag-Unet.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Multi-task Learning for Neonatal Brain Segmentation Using 3D Dense-Unet with Dense Attention Guided by Geodesic Distance
    Toan Duc Bui
    Wang, Li
    Chen, Jian
    Lin, Weili
    Li, Gang
    Shen, Dinggang
    DOMAIN ADAPTATION AND REPRESENTATION TRANSFER AND MEDICAL IMAGE LEARNING WITH LESS LABELS AND IMPERFECT DATA, DART 2019, MIL3ID 2019, 2019, 11795 : 243 - 251
  • [2] MT-UNET: A NOVEL U-NET BASED MULTI-TASK ARCHITECTURE FOR VISUAL SCENE UNDERSTANDING
    Jha, Ankit
    Kumar, Awanish
    Pande, Shivam
    Banerjee, Biplab
    Chaudhuri, Subhasis
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2191 - 2195
  • [3] HirMTL: Hierarchical Multi-Task Learning for dense scene understanding
    Luo, Huilan
    Hu, Weixia
    Wei, Yixiao
    He, Jianlong
    Yu, Minghao
    NEURAL NETWORKS, 2025, 181
  • [4] Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
    Ye, Hanrong
    Xu, Dan
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 514 - 530
  • [5] UNET-Based Multi-Task Architecture for Brain Lesion Segmentation
    Abolvardi, Ava Assadi
    Hamey, Len
    Ho-Shon, Kevin
    2020 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2020,
  • [6] A New Multi-task Network for Autonomous Driving: Efficientnetv1_Unet
    Li, Jiatian
    Peng, Jiangtao
    Meng, Ran
    Long, Qian
    Luo, Xinyu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XI, ICIC 2024, 2024, 14872 : 441 - 451
  • [7] MTMamba: Enhancing Multi-task Dense Scene Understanding by Mamba-Based Decoders
    Lin, Baijiong
    Jiang, Weisen
    Chen, Pengguang
    Zhang, Yu
    Liu, Shu
    Chen, Ying-Cong
    COMPUTER VISION - ECCV 2024, PT LXX, 2025, 15128 : 314 - 330
  • [8] MLSA-UNET: END-TO-END MULTI-LEVEL SPATIAL ATTENTION GUIDED UNET FOR INDUSTRIAL DEFECT SEGMENTATION
    Lin, Dongyun
    Cheng, Yi
    Li, Yiqun
    Prasad, Shitala
    Guo, Aiyuan
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 441 - 445
  • [9] A multi-scale, multi-task fusion UNet model for accurate breast tumor segmentation
    Dai, Shuo
    Liu, Xueyan
    Wei, Wei
    Yin, Xiaoping
    Qiao, Lishan
    Wang, Jianing
    Zhang, Yu
    Hou, Yan
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2025, 258
  • [10] Prompt Guided Transformer for Multi-Task Dense Prediction
    Lu, Yuxiang
    Sirejiding, Shalayiding
    Ding, Yue
    Wang, Chunlin
    Lu, Hongtao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6375 - 6385