Vision-Aware Language Reasoning for Referring Image Segmentation

被引:0
作者
Fayou Xu
Bing Luo
Chao Zhang
Li Xu
Mingxing Pu
Bo Li
机构
[1] Xihua University,School of Computer and Software Engineering
[2] Sichuan Police College,Key Laboratory of Intelligent Policing
[3] Xihua University,School of Science
来源
Neural Processing Letters | 2023年 / 55卷
关键词
Referring image segmentation; Vision and language; Explainable language-structure reasoning;
D O I
暂无
中图分类号
学科分类号
摘要
Referring image segmentation is a multimodal joint task that aims to segment linguistically indicated objects from images in paired expressions and images. However, the diversity of language annotations trends to result in semantic ambiguity, which makes the semantic representation of language feature encoding imprecise. Existing methods ignore the correction of language encoding module, so that the semantic error of language features cannot be improved in the subsequent process, resulting in semantic deviation. To this end, we propose a vision-aware language reasoning model. Intuitively, the segmentation result can be used to guide the reconstruction of language features, which could be expressed as a tree-structured recursive process. Specifically, we designed a language reasoning encoding module and a mask loopback optimization module to optimize the language encoding tree. The feature weights of tree nodes are learned through backpropagation. In order to overcome the problem that local language words and visual regions are easily introduced into noise regions in the traditional attention module, we use the global language prior information to calculate the importance of different words to further weight the visual region features, which could be embodied as language-aware vision attention module. Our experimental results on four benchmark datasets show that the proposed method achieves performance improvement.
引用
收藏
页码:11313 / 11331
页数:18
相关论文
共 50 条
[21]   Distillation and Supplementation of Features for Referring Image Segmentation [J].
Tan, Zeyu ;
Xu, Dahong ;
Li, Xi ;
Liu, Hong .
IEEE ACCESS, 2024, 12 :171269-171279
[22]   Referring Image Segmentation Without Text Annotations [J].
Liu, Jing ;
Jiang, Huajie ;
Bi, Yandong ;
Hu, Yongli ;
Yin, Baocai .
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 :278-293
[23]   REFERRING IMAGE SEGMENTATION FOR REMOTE SENSING DATA [J].
Yuan, Zhenghang ;
Mou, Lichao ;
Hua, Yuansheng ;
Zhu, Xiao Xiang .
IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, :946-949
[24]   Structured Multimodal Fusion Network for Referring Image Segmentation [J].
Xue, Mingcheng ;
Liu, Yu ;
Xu, Kaiping ;
Zhang, Haiyang ;
Yu, Chengyang .
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, :36-47
[25]   Text-Guided Refinement for Referring Image Segmentation [J].
Qiu, Shuang ;
Zhang, Shiyin ;
Ruan, Tao .
APPLIED SCIENCES-BASEL, 2025, 15 (09)
[26]   A survey of methods for addressing the challenges of referring image segmentation [J].
Ji, Lixia ;
Du, Yunlong ;
Dang, Yiping ;
Gao, Wenzhao ;
Zhang, Han .
NEUROCOMPUTING, 2024, 583
[27]   PRNet: A Progressive Refinement Network for referring image segmentation [J].
Liu, Jing ;
Jiang, Huajie ;
Hu, Yongli ;
Yin, Baocai .
NEUROCOMPUTING, 2025, 630
[28]   A CONTEXT-BASED NETWORK FOR REFERRING IMAGE SEGMENTATION [J].
Li, Xinyu ;
Liu, Yu ;
Xu, Kaiping ;
Zhao, Zhehuan ;
Liu, Sipei .
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, :1436-1440
[29]   Bilateral Knowledge Interaction Network for Referring Image Segmentation [J].
Ding, Haixin ;
Zhang, Shengchuan ;
Wu, Qiong ;
Yu, Songlin ;
Hu, Jie ;
Cao, Liujuan ;
Ji, Rongrong .
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 :2966-2977
[30]   Dual Context Perception Transformer for Referring Image Segmentation [J].
Kong, Yuqiu ;
Liu, Junhua ;
Yao, Cuili .
PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 :216-230