DIAL: Dense Image-Text ALignment for Weakly Supervised Semantic Segmentation

被引:0
|
作者
Jang, Soojin [1 ]
Yun, Jungmin [2 ]
Kwon, Junehyoung [2 ]
Lee, Eunju [1 ]
Kim, Youngbin [1 ,2 ]
机构
[1] Chung Ang Univ, Grad Sch Adv Imaging Sci Multimedia & Film, Seoul, South Korea
[2] Chung Ang Univ, Dept Artificial Intelligence, Seoul, South Korea
来源
基金
新加坡国家研究基金会;
关键词
weakly supervised semantic segmentation; image-level labels supervision; single-stage framework;
D O I
10.1007/978-3-031-72890-7_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised semantic segmentation (WSSS) approaches typically rely on class activation maps (CAMs) for initial seed generation, which often fail to capture global context due to limited supervision from image-level labels. To address this issue, we introduce DALNet, Dense Alignment Learning Network that leverages text embeddings to enhance the comprehensive understanding and precise localization of objects across different levels of granularity. Our key insight is to employ a dual-level alignment strategy: (1) Global Implicit Alignment (GIA) to capture global semantics by maximizing the similarity between the class token and the corresponding text embeddings while minimizing the similarity with background embeddings, and (2) Local Explicit Alignment (LEA) to improve object localization by utilizing spatial information from patch tokens. Moreover, we propose a cross-contrastive learning approach that aligns foreground features between image and text modalities while separating them from the background, encouraging activation in missing regions and suppressing distractions. Through extensive experiments on the PASCAL VOC and MS COCO datasets, we demonstrate that DALNet significantly outperforms state-of-the-art WSSS methods. Our approach, in particular, allows for more efficient end-to-end process as a single-stage method.
引用
收藏
页码:248 / 266
页数:19
相关论文
共 50 条
  • [1] Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
    Wu, Ji-Jia
    Chang, Andy Chia-Hao
    Chuang, Chi Eh-Yu
    Chen, Chun-Pei
    Liu, Yu-Lun
    Chen, Min-Hung
    Hu, Hou-Ning
    Chuang, Yung-Yu
    Lin, Yen-Yu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26784 - 26793
  • [2] Weakly Supervised Learning of Dense Semantic Correspondences and Segmentation
    Ufer, Nikolai
    Lui, Kam To
    Schwarz, Katja
    Warkentin, Paul
    Ommer, Bjoern
    PATTERN RECOGNITION, DAGM GCPR 2019, 2019, 11824 : 456 - 470
  • [3] WEAKLY SUPERVISED ALIGNMENT OF IMAGE MANIFOLDS WITH SEMANTIC TIES
    Tuia, Devis
    2014 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2014, : 3546 - 3549
  • [4] Image-text aggregation for open-vocabulary semantic segmentation
    Cheng, Shengyang
    Huang, Jianyong
    Wang, Xiaodong
    Huang, Lei
    Wei, Zhiqiang
    NEUROCOMPUTING, 2025, 630
  • [5] Image Piece Learning for Weakly Supervised Semantic Segmentation
    Li, Yi
    Guo, Yanqing
    Kao, Yueying
    He, Ran
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2017, 47 (04): : 648 - 659
  • [6] Local Alignment with Global Semantic Consistence Network for Image-Text Matching
    Li, Pengwei
    Wu, Shihua
    Lian, Zhichao
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 652 - 657
  • [7] Weakly-Supervised Dual Clustering for Image Semantic Segmentation
    Liu, Yang
    Liu, Jing
    Li, Zechao
    Tang, Jinhui
    Lu, Hanqing
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 2075 - 2082
  • [8] Weakly Supervised Image Semantic Segmentation Based on Clustering Superpixels
    Yan, Xiong
    Liu, Xiaohua
    NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615
  • [9] Weakly Supervised Semantic Segmentation with a Multi-Image Model
    Vezhnevets, Alexander
    Ferrari, Vittorio
    Buhmann, Joachim M.
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 643 - 650
  • [10] Weakly supervised semantic segmentation for optic disc of fundus image
    Lu, Zheng
    Chen, Dali
    Xue, Dingyu
    Zhang, Shibo
    JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (03)