Vision-Aware Language Reasoning for Referring Image Segmentation

被引:0
作者
Fayou Xu
Bing Luo
Chao Zhang
Li Xu
Mingxing Pu
Bo Li
机构
[1] Xihua University,School of Computer and Software Engineering
[2] Sichuan Police College,Key Laboratory of Intelligent Policing
[3] Xihua University,School of Science
来源
Neural Processing Letters | 2023年 / 55卷
关键词
Referring image segmentation; Vision and language; Explainable language-structure reasoning;
D O I
暂无
中图分类号
学科分类号
摘要
Referring image segmentation is a multimodal joint task that aims to segment linguistically indicated objects from images in paired expressions and images. However, the diversity of language annotations trends to result in semantic ambiguity, which makes the semantic representation of language feature encoding imprecise. Existing methods ignore the correction of language encoding module, so that the semantic error of language features cannot be improved in the subsequent process, resulting in semantic deviation. To this end, we propose a vision-aware language reasoning model. Intuitively, the segmentation result can be used to guide the reconstruction of language features, which could be expressed as a tree-structured recursive process. Specifically, we designed a language reasoning encoding module and a mask loopback optimization module to optimize the language encoding tree. The feature weights of tree nodes are learned through backpropagation. In order to overcome the problem that local language words and visual regions are easily introduced into noise regions in the traditional attention module, we use the global language prior information to calculate the importance of different words to further weight the visual region features, which could be embodied as language-aware vision attention module. Our experimental results on four benchmark datasets show that the proposed method achieves performance improvement.
引用
收藏
页码:11313 / 11331
页数:18
相关论文
共 50 条
[41]   Cross-Modal Recurrent Semantic Comprehension for Referring Image Segmentation [J].
Shang, Chao ;
Li, Hongliang ;
Qiu, Heqian ;
Wu, Qingbo ;
Meng, Fanman ;
Zhao, Taijin ;
Ngan, King Ngi .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) :3229-3242
[42]   Multiscale deep feature selection fusion network for referring image segmentation [J].
Dai, Xianwen ;
Lin, Jiacheng ;
Nai, Ke ;
Li, Qingpeng ;
Li, Zhiyong .
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) :36287-36305
[43]   Multi-Scale Referring Image Segmentation Based on Dual Attention [J].
Hu, Mengnan ;
Wang, Rong ;
Zhang, Wenjing ;
Zhang, Qi .
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2025, 37 (01) :148-156
[44]   Referring Image Segmentation With Fine-Grained Semantic Funneling Infusion [J].
Yang, Jiaxing ;
Zhang, Lihe ;
Lu, Huchuan .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) :14727-14738
[45]   CMF: CASCADED MULTI-MODEL FUSION FOR REFERRING IMAGE SEGMENTATION [J].
Yang, Jianhua ;
Huang, Yan ;
Ma, Zhanyu ;
Wang, Liang .
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, :2289-2293
[46]   GENERATIVE ADVERSARIAL NETWORK INCLUDING REFERRING IMAGE SEGMENTATION FOR TEXT-GUIDED IMAGE MANIPULATION [J].
Watanabe, Yuto ;
Togo, Ren ;
Maeda, Keisuke ;
Ogawa, Takahiro ;
Haseyama, Miki .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :4818-4822
[47]   CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation [J].
Wang, Wenxuan ;
He, Xingjian ;
Zhang, Yisi ;
Guo, Longteng ;
Shen, Jiachen ;
Li, Jiangyun ;
Liu, Jing .
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 :6906-6916
[48]   Area-keywords cross-modal alignment for referring image segmentation [J].
Zhang, Huiyong ;
Wang, Lichun ;
Li, Shuang ;
Xu, Kai ;
Yin, Baocai .
NEUROCOMPUTING, 2024, 581
[49]   TOWARDS GENERALIZABLE REFERRING IMAGE SEGMENTATION VIA TARGET PROMPT AND VISUAL COHERENCE [J].
Liu, Yajie ;
Ge, Pu ;
Ma, Haoxiang ;
Fan, Shichao ;
Liu, Qingjie ;
Huang, Di ;
Wang, Yunhong .
2024 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2024, :2599-2605
[50]   Two-stage Visual Cues Enhancement Network for Referring Image Segmentation [J].
Jiao, Yang ;
Jie, Zequn ;
Luo, Weixin ;
Chen, Jingjing ;
Jiang, Yu-Gang ;
Wei, Xiaolin ;
Ma, Lin .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :1331-1340