Recurrent Multimodal Interaction for Referring Image Segmentation

被引：126

作者：

Liu, Chenxi ^{[1
]}

Lin, Zhe ^{[2
]}

Shen, Xiaohui ^{[2
]}

Yang, Jimei ^{[2
]}

Lu, Xin ^{[2
]}

Yuille, Alan ^{[1
]}

机构：

[1] Johns Hopkins Univ, Baltimore, MD 21218 USA

[2] Adobe Res, San Jose, CA USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2017年

关键词：

D O I：

10.1109/ICCV.2017.143

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we are interested in the problem of image segmentation given natural language descriptions, i.e. referring expressions. Existing works tackle this problem by first modeling images and sentences independently and then segment images by combining these two types of representations. We argue that learning word-to-image interaction is more native in the sense of jointly modeling two modalities for the image segmentation task, and we propose convolutional multimodal LSTM to encode the sequential interactions between individual words, visual information, and spatial information. We show that our proposed model outperforms the baseline model on benchmark datasets. In addition, we analyze the intermediate output of the proposed multimodal LSTM approach and empirically explain how this approach enforces a more effective word-to-image interaction.(1)

引用

页码：1280 / 1289

页数：10

共 50 条

[21] Structured Attention Network for Referring Image Segmentation
Lin, Liang
Yan, Pengxiang
Xu, Xiaoqian
Yang, Sibei
Zeng, Kun
Li, Guanbin
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1922 - 1932
[22] REFERRING IMAGE SEGMENTATION FOR REMOTE SENSING DATA
Yuan, Zhenghang
Mou, Lichao
Hua, Yuansheng
Zhu, Xiao Xiang
IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 946 - 949
[23] PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Liu, Jiang
Ding, Hui
Cai, Zhaowei
Zhang, Yuting
Satzoda, Ravi Kumar
Mahadevan, Vijay
Manmatha, R.
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18653 - 18663
[24] CRIS: CLIP-Driven Referring Image Segmentation
Wang, Zhaoqing
Lu, Yu
Li, Qiang
Tao, Xunqiang
Guo, Yandong
Gong, Mingming
Liu, Tongliang
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11676 - 11685
[25] Dual Convolutional LSTM Network for Referring Image Segmentation
Ye, Linwei
Liu, Zhi
Wang, Yang
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3224 - 3235
[26] Attentive Excitation and Aggregation for Bilingual Referring Image Segmentation
Zhou, Qianli
Hui, Tianrui
Wang, Rong
Hu, Haimiao
Liu, Si
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (02)
[27] A survey of methods for addressing the challenges of referring image segmentation
Ji, Lixia
Du, Yunlong
Dang, Yiping
Gao, Wenzhao
Zhang, Han
NEUROCOMPUTING, 2024, 583
[28] Locate then Segment: A Strong Pipeline for Referring Image Segmentation
Jing, Ya
Kong, Tao
Wang, Wei
Wang, Liang
Li, Lei
Tan, Tieniu
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9853 - 9862
[29] Learning From Box Annotations for Referring Image Segmentation
Feng, Guang
Zhang, Lihe
Hu, Zhiwei
Lu, Huchuan
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3927 - 3937
[30] PRNet: A Progressive Refinement Network for referring image segmentation
Liu, Jing
Jiang, Huajie
Hu, Yongli
Yin, Baocai
NEUROCOMPUTING, 2025, 630

← 1 2 3 4 5 →