Dual Context Perception Transformer for Referring Image Segmentation

被引:0
作者
Kong, Yuqiu [1 ]
Liu, Junhua [1 ]
Yao, Cuili [1 ]
机构
[1] Dalian Univ Technol, Dalian 116024, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024 | 2025年 / 15035卷
基金
中国国家自然科学基金;
关键词
Referring image segmentation; Vision-linguistic alignment; Multi-modal fusion;
D O I
10.1007/978-981-97-8620-6_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation segments target objects in the image according to language expressions. Existing methods mainly make efforts to integrate multi-modal features with attention mechanisms. However, most methods tend to incline to the feature of a single modal during the fusion stage and fall short in exploring cross-modal contextual information, which is critical in localizing accurate target regions. To this end, we propose a novel architecture named Dual Context Perception Transformer (DCPformer) which considers both visual and linguistic contextual information during the fusion and reasoning stages. Specifically, a Cross-modal Context-aware Perception Module (CCPM) is designed to model cross-modal alignment in a unified visual-linguistic representation space. Furthermore, we propose an Information Feedback Module (IFM) that generates a rectification mask based on deep-scale features and filters unrelated signals of the target object in features of shallower scales. Extensive experiments show that the proposed DCP-former achieves state-of-the-art performances against existing methods on three challenging benchmarks.
引用
收藏
页码:216 / 230
页数:15
相关论文
共 50 条
[1]   De-noising mask transformer for referring image segmentation [J].
Wang, Yehui ;
Lei, Fang ;
Wang, Baoyan ;
Zhang, Qiang ;
Zhen, Xiantong ;
Zhang, Lei .
IMAGE AND VISION COMPUTING, 2025, 154
[2]   CARIS: Context-Aware Referring Image Segmentation [J].
Liu, Sun-Ao ;
Zhang, Yiheng ;
Qiu, Zhaofan ;
Xie, Hongtao ;
Zhang, Yongdong ;
Yao, Ting .
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :779-788
[3]   A CONTEXT-BASED NETWORK FOR REFERRING IMAGE SEGMENTATION [J].
Li, Xinyu ;
Liu, Yu ;
Xu, Kaiping ;
Zhao, Zhehuan ;
Liu, Sipei .
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, :1436-1440
[4]   Dual Convolutional LSTM Network for Referring Image Segmentation [J].
Ye, Linwei ;
Liu, Zhi ;
Wang, Yang .
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) :3224-3235
[5]   Cross-modal transformer with language query for referring image segmentation [J].
Zhang, Wenjing ;
Tan, Quange ;
Li, Pengxin ;
Zhang, Qi ;
Wang, Rong .
NEUROCOMPUTING, 2023, 536 :191-205
[6]   Token-word mixer meets object-aware transformer for referring image segmentation [J].
Zhang, Zhenliang ;
Teng, Zhu ;
Fan, Jack ;
Zhang, Baopeng ;
Fan, Jianping .
PATTERN RECOGNITION, 2024, 155
[7]   Dual-graph hierarchical interaction network for referring image segmentation [J].
Shi, Zhaofeng ;
Wu, Qingbo ;
Li, Hongliang ;
Meng, Fanman ;
Ngan, King Ngi .
DISPLAYS, 2023, 80
[8]   Hierarchical collaboration for referring image segmentation [J].
Zhang, Wei ;
Cheng, Zesen ;
Chen, Jie ;
Gao, Wen .
NEUROCOMPUTING, 2025, 613
[9]   SMVT: Spectrum-Driven Multi-scale Vision Transformer for Referring Image Segmentation [J].
Li, Tianxiao ;
Chen, Junhong ;
Huang, Yiheng ;
Huang, Kesi ;
Xia, Qiqiang ;
Asim, Muhammad ;
Liu, Wenyin .
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VI, ICIC 2024, 2024, 14867 :193-206
[10]   A Hybrid Framework for Referring Image Segmentation: Dual-Decoder Model with SAM Complementation [J].
Chen, Haoyuan ;
Zhou, Sihang ;
Li, Kuan ;
Yin, Jianping ;
Huang, Jian .
MATHEMATICS, 2024, 12 (19)