PAG-Unet: multi-task dense scene understanding with pixel-attention-guided Unet

被引:0
|
作者
Xu, Yi [1 ,2 ]
Li, Changhao [1 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, 111 Kowloon Rd, Hefei 230601, Anhui, Peoples R China
[2] Anhui Univ, SKey Lab Intelligent Comp & Signal Proc, Minist Educ, 111 Kowloon Rd, Hefei 230601, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Muti-task dense scene understanding; Pixel-attention-guided Unet; Feature enhancement; Task interaction;
D O I
10.1007/s10489-025-06389-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-task dense scene understanding is a fundamental research area in computer vision (CV). By predicting pixels, perceiving, and reasoning about multiple related tasks, it improves both accuracy and data efficiency. However, it faces the challenge that some tasks may require more independent feature representations, and excessive sharing can lead to interference between tasks. To address this issue, we propose a novel Pixel-Attention-Guided Unet (PAG-Unet). PAG-Unet incorporates a Pixel-Attention-Guided Fusion module (PAG Fusion) and a Multi-Task Self-Attention module (MTSA) to enhance task-specific feature extraction and reduce task interference. PAG Fusion leverages the relationship between shallow and deep features by using task-specific deep features to calibrate the distribution of shared shallow features. This suppresses background noise and enhances semantic features, thereby fully extracting task-specific features for different tasks and achieving feature enhancement. MTSA considers both global and local spatial interactions for each task during task interactions, capturing task-specific information and compensating for the loss of crucial details, thus improving prediction accuracy for each task. Our method achieves superior multi-task performance on the New York University Depth v2(NYUD-v2) and PASCAL Visual Object Classes Context(PASCAL-Context) datasets, with most metrics significantly outperforming previous state-of-the-art methods. The code is available at https://github.com/UPLI-123/Pag-Unet.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Pool-Unet: A Novel Tongue Image Segmentation Method Based on Pool-Former and Multi-Task Mask Learning
    Li, Xiangrun
    Sheng, Qiyu
    Zhou, Guangda
    Wei, Jialong
    Shi, Yanmin
    Zhao, Zhen
    Li, Yongwei
    Li, Xingfeng
    Liu, Yang
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2024, E107A (10) : 1609 - 1620
  • [22] InvPT plus plus : Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding
    Ye, Hanrong
    Xu, Dan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7493 - 7508
  • [23] MVANet: Multi-Task Guided Multi-View Attention Network for Chinese Food Recognition
    Liang, Haozan
    Wen, Guihua
    Hu, Yang
    Luo, Mingnan
    Yang, Pei
    Xu, Yingxue
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3551 - 3561
  • [24] MVANet: Multi-Task Guided Multi-View Attention Network for Chinese Food Recognition
    Liang, Haozan
    Wen, Guihua
    Hu, Yang
    Luo, Mingnan
    Yang, Pei
    Xu, Yingxue
    IEEE Transactions on Multimedia, 2021, 23 : 3551 - 3561
  • [25] Semi-Supervised Learning for Multi-Task Scene Understanding by Neural Graph Consensus
    Leordeanu, Marius
    Pirvu, Mihai Cristian
    Costea, Dragos
    Marcu, Alina E.
    Slusanschi, Emil
    Sukthankar, Rahul
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1882 - 1892
  • [26] HF-UNet: Learning Hierarchically Inter-Task Relevance in Multi-Task U-Net for Accurate Prostate Segmentation in CT Images
    He, Kelei
    Lian, Chunfeng
    Zhang, Bing
    Zhang, Xin
    Cao, Xiaohuan
    Nie, Dong
    Gao, Yang
    Zhang, Junfeng
    Shen, Dinggang
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (08) : 2118 - 2128
  • [27] Attention Guided Multi-Task Network for Joint CFO and Channel Estimation in OFDM Systems
    Chen, Zhuo
    Liu, Zhiang
    Geng, Xue
    Zhao, Yingxin
    Wu, Hong
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2024, 23 (01) : 321 - 333
  • [28] Attention-guided model for mitral regurgitation analysis based on multi-task learning
    Wu, Jing
    Ge, Zhenyi
    Huang, Helin
    Wang, Hairui
    Li, Nan
    Hu, Chunqiang
    Pan, Cuizhen
    Wu, Xiaomei
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 101
  • [29] Task-aware asynchronous multi-task model with class incremental contrastive learning for surgical scene understanding
    Seenivasan, Lalithkumar
    Islam, Mobarakol
    Xu, Mengya
    Lim, Chwee Ming
    Ren, Hongliang
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2023, 18 (05) : 921 - 928
  • [30] Task-aware asynchronous multi-task model with class incremental contrastive learning for surgical scene understanding
    Lalithkumar Seenivasan
    Mobarakol Islam
    Mengya Xu
    Chwee Ming Lim
    Hongliang Ren
    International Journal of Computer Assisted Radiology and Surgery, 2023, 18 : 921 - 928