Recurrent Multimodal Interaction for Referring Image Segmentation

被引：126

作者：

Liu, Chenxi ^{[1
]}

Lin, Zhe ^{[2
]}

Shen, Xiaohui ^{[2
]}

Yang, Jimei ^{[2
]}

Lu, Xin ^{[2
]}

Yuille, Alan ^{[1
]}

机构：

[1] Johns Hopkins Univ, Baltimore, MD 21218 USA

[2] Adobe Res, San Jose, CA USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2017年

关键词：

D O I：

10.1109/ICCV.2017.143

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we are interested in the problem of image segmentation given natural language descriptions, i.e. referring expressions. Existing works tackle this problem by first modeling images and sentences independently and then segment images by combining these two types of representations. We argue that learning word-to-image interaction is more native in the sense of jointly modeling two modalities for the image segmentation task, and we propose convolutional multimodal LSTM to encode the sequential interactions between individual words, visual information, and spatial information. We show that our proposed model outperforms the baseline model on benchmark datasets. In addition, we analyze the intermediate output of the proposed multimodal LSTM approach and empirically explain how this approach enforces a more effective word-to-image interaction.(1)

引用

页码：1280 / 1289

页数：10

共 50 条

[31] A CONTEXT-BASED NETWORK FOR REFERRING IMAGE SEGMENTATION
Li, Xinyu
Liu, Yu
Xu, Kaiping
Zhao, Zhehuan
Liu, Sipei
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1436 - 1440
[32] CARIS: Context-Aware Referring Image Segmentation
Liu, Sun-Ao
Zhang, Yiheng
Qiu, Zhaofan
Xie, Hongtao
Zhang, Yongdong
Yao, Ting
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 779 - 788
[33] Query Reconstruction Network for Referring Expression Image Segmentation
Shi, Hengcan
Li, Hongliang
Wu, Qingbo
Ngan, King Ngi
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 995 - 1007
[34] Advancing Referring Expression Segmentation Beyond Single Image
Wu, Yixuan
Zhang, Zhao
Xie, Chi
Zhu, Feng
Zhao, Rui
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2628 - 2638
[35] Dual Context Perception Transformer for Referring Image Segmentation
Kong, Yuqiu
Liu, Junhua
Yao, Cuili
PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 216 - 230
[36] End-to-End Referring Video Object Segmentation with Multimodal Transformers
Botach, Adam
Zheltonozhskii, Evgenii
Baskin, Chaim
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4975 - 4985
[37] Multimodal evaluation for medical image segmentation
Cardenes, Ruben
Bach, Meritxell
Chi, Ying
Marras, Ioannis
de Luis, Rodrigo
Anderson, Mats
Cashman, Peter
Bultelle, Matthieu
COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PROCEEDINGS, 2007, 4673 : 229 - 236
[38] Local-global coordination with transformers for referring image segmentation
Liu, Fang
Kong, Yuqiu
Zhang, Lihe
Feng, Guang
Yin, Baocai
NEUROCOMPUTING, 2023, 522 : 39 - 52
[39] Text-Vision Relationship Alignment for Referring Image Segmentation
Pu, Mingxing
Luo, Bing
Zhang, Chao
Xu, Li
Xu, Fayou
Kong, Mingming
NEURAL PROCESSING LETTERS, 2024, 56 (02)
[40] Referring Image Segmentation via Language-Driven Attention
Chen, Ding-Jie
Hsieh, He-Yen
Liu, Tyng-Luh
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13997 - 14003

← 1 2 3 4 5 →