Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation

被引：0

作者：

Yan, Yichen ^{[1
,2
]}

He, Xingjian ^{[1
]}

Chen, Sihan ^{[2
]}

Liu, Jing ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

referring image segmentation; iterative calibration; language reconstruction;

D O I：

10.1145/3652583.3658095

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring image segmentation aims to segment an object referred to by natural language expression from an image. The primary challenge lies in the efficient propagation of fine-grained semantic information from textual features to visual features. Many recent works utilize a Transformer to address this challenge. However, conventional transformer decoders can distort linguistic information with deeper layers, leading to suboptimal results. In this paper, we introduce CRFormer, a model that iteratively calibrates multi-modal features in the transformer decoder. We start by generating language queries using vision features, emphasizing different aspects of the input language. Then, we propose a novel Calibration Decoder (CDec) wherein the multi-modal features can iteratively calibrated by the input language features. In the Calibration Decoder, we use the output of each decoder layer and the original language features to generate new queries for continuous calibration, which gradually updates the language features. Based on CDec, we introduce a Language Reconstruction Module and a reconstruction loss. This module leverages queries from the final layer of the decoder to reconstruct the input language and compute the reconstruction loss. This can further prevent the language information from being lost or distorted. Our experiments consistently show the superior performance of our approach across RefCOCO, RefCOCO+, and G-Ref datasets compared to state-of-the-art methods.

引用

页码：451 / 459

页数：9

共 49 条

[31] CMF: CASCADED MULTI-MODEL FUSION FOR REFERRING IMAGE SEGMENTATION
Yang, Jianhua
Huang, Yan
Ma, Zhanyu
Wang, Liang
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2289 - 2293
[32] GENERATIVE ADVERSARIAL NETWORK INCLUDING REFERRING IMAGE SEGMENTATION FOR TEXT-GUIDED IMAGE MANIPULATION
Watanabe, Yuto
Togo, Ren
Maeda, Keisuke
Ogawa, Takahiro
Haseyama, Miki
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4818 - 4822
[33] Area-keywords cross-modal alignment for referring image segmentation
Zhang, Huiyong
Wang, Lichun
Li, Shuang
Xu, Kai
Yin, Baocai
NEUROCOMPUTING, 2024, 581
[34] Cross-modal attention guided visual reasoning for referring image segmentation
Zhang, Wenjing
Hu, Mengnan
Tan, Quange
Zhou, Qianli
Wang, Rong
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28853 - 28872
[35] CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation
Xu, Mingzhu
Xiao, Tianxiang
Liu, Yutong
Tang, Haoyu
Hu, Yupeng
Nie, Liqiang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 3234 - 3249
[36] Two-stage Visual Cues Enhancement Network for Referring Image Segmentation
Jiao, Yang
Jie, Zequn
Luo, Weixin
Chen, Jingjing
Jiang, Yu-Gang
Wei, Xiaolin
Ma, Lin
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1331 - 1340
[37] Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation
Liu, Chang
Ding, Henghui
Zhang, Yulun
Jiang, Xudong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3054 - 3065
[38] SATR: Semantics-Aware Triadic Refinement network for referring image segmentation
Xie, Jialong
Liu, Jin
Wang, Guoxiang
Zhou, Fengyu
KNOWLEDGE-BASED SYSTEMS, 2024, 284
[39] Cross-modal attention guided visual reasoning for referring image segmentation
Wenjing Zhang
Mengnan Hu
Quange Tan
Qianli Zhou
Rong Wang
Multimedia Tools and Applications, 2023, 82 : 28853 - 28872
[40] Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation
Lei, Sen
Xiao, Xinyu
Zhang, Tianlin
Li, Heng-Chao
Shi, Zhenwei
Zhu, Qing
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63

← 1 2 3 4 5 →