Structured Attention Network for Referring Image Segmentation

被引:33
作者
Lin, Liang [1 ]
Yan, Pengxiang [1 ]
Xu, Xiaoqian [1 ]
Yang, Sibei [2 ]
Zeng, Kun [1 ]
Li, Guanbin [1 ]
机构
[1] Sun Yat Sen Univ, Sch Engn & Comp Sci, Guangzhou 510006, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Linguistics; Image segmentation; Cognition; Feature extraction; Semantics; Task analysis; Referring image segmentation; vision and language; cross-modal reasoning;
D O I
10.1109/TMM.2021.3074008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Referring image segmentation aims at segmenting out the object or stuff referred to by a natural language expression. The challenge of this task lies in the requirement of understanding both vision and language. The linguistic structure of a referring expression can provide an intuitive and explainable layout for reasoning over visual and linguistic concepts. In this paper, we propose a structured attention network (SANet) to explore the multimodal reasoning over the dependency tree parsed from the referring expression. Specifically, SANet implements the multimodal reasoning using an attentional multimodal tree-structure recurrent module (AMTreeGRU) in a bottom-up manner. In addition, for spatial detail improvement, SANet further incorporates the semantics-guided low-level features into high-level ones using the proposed attentional skip connection module. Extensive experiments on four public benchmark datasets demonstrate the superiority of our proposed SANet with more explainable visualization examples.
引用
收藏
页码:1922 / 1932
页数:11
相关论文
共 54 条
[1]  
[Anonymous], 2011, P ADV NEUR INF PROC
[2]  
[Anonymous], 2018, IEEE T MULTIMEDIA, DOI DOI 10.1109/TMM.2018.2811621
[3]   MUTAN: Multimodal Tucker Fusion for Visual Question Answering [J].
Ben-younes, Hedi ;
Cadene, Remi ;
Cord, Matthieu ;
Thome, Nicolas .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2631-2639
[4]  
Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)
[5]   Visual Question Reasoning on General Dependency Tree [J].
Cao, Qingxing ;
Liang, Xiaodan ;
Li, Bailin ;
Li, Guanbin ;
Lin, Liang .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7249-7257
[6]   A New Method for the Analysis of Bacterial Endotoxins in Ultrapure Paraffin Oil [J].
Chen, Dandan .
JOURNAL OF ANALYTICAL METHODS IN CHEMISTRY, 2014, 2014
[7]   See-Through-Text Grouping for Referring Image Segmentation [J].
Chen, Ding-Jie ;
Jia, Songhao ;
Lo, Yi-Chen ;
Chen, Hwann-Tzong ;
Liu, Tyng-Luh .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7453-7462
[8]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[9]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[10]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709