Recurrent Multimodal Interaction for Referring Image Segmentation

被引：126

作者：

Liu, Chenxi ^{[1
]}

Lin, Zhe ^{[2
]}

Shen, Xiaohui ^{[2
]}

Yang, Jimei ^{[2
]}

Lu, Xin ^{[2
]}

Yuille, Alan ^{[1
]}

机构：

[1] Johns Hopkins Univ, Baltimore, MD 21218 USA

[2] Adobe Res, San Jose, CA USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2017年

关键词：

D O I：

10.1109/ICCV.2017.143

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we are interested in the problem of image segmentation given natural language descriptions, i.e. referring expressions. Existing works tackle this problem by first modeling images and sentences independently and then segment images by combining these two types of representations. We argue that learning word-to-image interaction is more native in the sense of jointly modeling two modalities for the image segmentation task, and we propose convolutional multimodal LSTM to encode the sequential interactions between individual words, visual information, and spatial information. We show that our proposed model outperforms the baseline model on benchmark datasets. In addition, we analyze the intermediate output of the proposed multimodal LSTM approach and empirically explain how this approach enforces a more effective word-to-image interaction.(1)

引用

页码：1280 / 1289

页数：10

共 50 条

[41] Prompt-Driven Referring Image Segmentation with Instance Contrasting
Shang, Chao
Song, Zichen
Qiu, Heqian
Wang, Lanxiao
Meng, Fanman
Li, Hongliang
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 4124 - 4134
[42] Beyond One-to-One: Rethinking the Referring Image Segmentation
Hu, Yutao
Wang, Qixiong
Shao, Wenqi
Xie, Enze
Li, Zhenguo
Han, Jungong
Luo, Ping
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4044 - 4054
[43] Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation
Yan, Yichen
He, Xingjian
Chen, Sihan
Liu, Jing
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 451 - 459
[44] Vision-Aware Language Reasoning for Referring Image Segmentation
Xu, Fayou
Luo, Bing
Zhang, Chao
Xu, Li
Pu, Mingxing
Li, Bo
NEURAL PROCESSING LETTERS, 2023, 55 (08) : 11313 - 11331
[45] Global and Local Interactive Perception Network for Referring Image Segmentation
Liu, Jing
Tan, Hongchen
Hu, Yongli
Sun, Yanfeng
Wang, Huasheng
Yin, Baocai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (12) : 1 - 14
[46] See-Through-Text Grouping for Referring Image Segmentation
Chen, Ding-Jie
Jia, Songhao
Lo, Yi-Chen
Chen, Hwann-Tzong
Liu, Tyng-Luh
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7453 - 7462
[47] Comprehensive Multi-Modal Interactions for Referring Image Segmentation
Jain, Kanishk
Gandhi, Vineet
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3427 - 3435
[48] Vision-Aware Language Reasoning for Referring Image Segmentation
Fayou Xu
Bing Luo
Chao Zhang
Li Xu
Mingxing Pu
Bo Li
Neural Processing Letters, 2023, 55 : 11313 - 11331
[49] Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation
Feng, Guang
Hu, Zhiwei
Zhang, Lihe
Sun, Jiayu
Lu, Huchuan
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2246 - 2258
[50] Global Selection and Local Attention Network for Referring Image Segmentation
Ding, Haixin
Zhang, Shengchuan
Cao, Liujuan
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 284 - 295

← 1 2 3 4 5 →