PAG-Unet: multi-task dense scene understanding with pixel-attention-guided Unet

被引:0
|
作者
Xu, Yi [1 ,2 ]
Li, Changhao [1 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, 111 Kowloon Rd, Hefei 230601, Anhui, Peoples R China
[2] Anhui Univ, SKey Lab Intelligent Comp & Signal Proc, Minist Educ, 111 Kowloon Rd, Hefei 230601, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Muti-task dense scene understanding; Pixel-attention-guided Unet; Feature enhancement; Task interaction;
D O I
10.1007/s10489-025-06389-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-task dense scene understanding is a fundamental research area in computer vision (CV). By predicting pixels, perceiving, and reasoning about multiple related tasks, it improves both accuracy and data efficiency. However, it faces the challenge that some tasks may require more independent feature representations, and excessive sharing can lead to interference between tasks. To address this issue, we propose a novel Pixel-Attention-Guided Unet (PAG-Unet). PAG-Unet incorporates a Pixel-Attention-Guided Fusion module (PAG Fusion) and a Multi-Task Self-Attention module (MTSA) to enhance task-specific feature extraction and reduce task interference. PAG Fusion leverages the relationship between shallow and deep features by using task-specific deep features to calibrate the distribution of shared shallow features. This suppresses background noise and enhances semantic features, thereby fully extracting task-specific features for different tasks and achieving feature enhancement. MTSA considers both global and local spatial interactions for each task during task interactions, capturing task-specific information and compensating for the loss of crucial details, thus improving prediction accuracy for each task. Our method achieves superior multi-task performance on the New York University Depth v2(NYUD-v2) and PASCAL Visual Object Classes Context(PASCAL-Context) datasets, with most metrics significantly outperforming previous state-of-the-art methods. The code is available at https://github.com/UPLI-123/Pag-Unet.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Multi-Task Deep Learning Design and Training Tool for Unified Visual Driving Scene Understanding
    Won, Woong-Jae
    Kim, Tae Hun
    Kwon, Soon
    2019 19TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2019), 2019, : 356 - 360
  • [32] One-Pass Multi-Task Networks With Cross-Task Guided Attention for Brain Tumor Segmentation
    Zhou, Chenhong
    Ding, Changxing
    Wang, Xinchao
    Lu, Zhentai
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 4516 - 4529
  • [33] Deep Multi-task Image Clustering with Attention-Guided Patch Filtering and Correlation Mining
    Tian, Zhongyao
    Li, Kai
    Peng, Jinjia
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IV, 2024, 14428 : 126 - 138
  • [34] MLGNet: Multi-Task Learning Network with Attention-Guided Mechanism for Segmenting Agricultural Fields
    Luo, Weiran
    Zhang, Chengcai
    Li, Ying
    Yan, Yaning
    REMOTE SENSING, 2023, 15 (16)
  • [35] An attention-guided and prior-embedded approach with multi-task learning for shadow detection
    Zhang, Shihui
    Li, He
    Kong, Weihang
    Zhang, Xiaowei
    Ren, Weidong
    KNOWLEDGE-BASED SYSTEMS, 2020, 194
  • [36] UniNet: A Unified Scene Understanding Network and Exploring Multi-Task Relationships through the Lens of Adversarial Attacks
    Gurulingan, Naresh Kumar
    Arani, Elahe
    Zonooz, Bahram
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2239 - 2248
  • [37] Multi-task Learning for Bi-temporal Remote Sensing Scene Parsing via Patch-pixel Representation
    Fu, Chenqin
    Bao, Tengfei
    Lv, Liang
    Sirajidin, Salayidin
    Fang, Tao
    Huo, Hong
    ICMLC 2020: 2020 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2018, : 360 - 367
  • [38] AdaMT-Net: An Adaptive Weight Learning Based Multi-Task Learning Model For Scene Understanding
    Jha, Ankit
    Kumar, Awanish
    Banerjee, Biplab
    Chaudhuri, Subhasis
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3027 - 3035
  • [39] Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional modelling
    Zhou, Suping
    Jia, Jia
    Yin, Yufeng
    Li, Xiang
    Yao, Yang
    Zhang, Ying
    Ye, Zeyang
    Lei, Kehua
    Huang, Yan
    Shen, Jialie
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1322 - 1330
  • [40] Deep Floor Plan Recognition Using a Multi-Task Network with Room-Boundary-Guided Attention
    Zeng, Zhiliang
    Li, Xianzhi
    Yu, Ying Kin
    Fu, Chi-Wing
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9095 - 9103