Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation

被引：13

作者：

Feng, Guang ^{[1
]}

Hu, Zhiwei ^{[1
]}

Zhang, Lihe ^{[1
]}

Sun, Jiayu ^{[1
]}

Lu, Huchuan ^{[1
]}

机构：

[1] Dalian Univ Technol, Sch Informat & Commun Engn, Dalian 116024, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Image segmentation; Location awareness; Visualization; Task analysis; Linguistics; Semantics; Feature extraction; Language-guided visual attention; referring image localization and segmentation; segmentation-guided feature augmentation; vision-guided linguistic attention (VLAM);

D O I：

10.1109/TNNLS.2021.3106153

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, referring image localization and segmentation has aroused widespread interest. However, the existing methods lack a clear description of the interdependence between language and vision. To this end, we present a bidirectional relationship inferring network (BRINet) to effectively address the challenging tasks. Specifically, we first employ a vision-guided linguistic attention module to perceive the keywords corresponding to each image region. Then, language-guided visual attention adopts the learned adaptive language to guide the update of the visual features. Together, they form a bidirectional cross-modal attention module (BCAM) to achieve the mutual guidance between language and vision. They can help the network align the cross-modal features better. Based on the vanilla language-guided visual attention, we further design an asymmetric language-guided visual attention, which significantly reduces the computational cost by modeling the relationship between each pixel and each pooled subregion. In addition, a segmentation-guided bottom-up augmentation module (SBAM) is utilized to selectively combine multilevel information flow for object localization. Experiments show that our method outperforms other state-of-the-art methods on three referring image localization datasets and four referring image segmentation datasets.

引用

页码：2246 / 2258

页数：13

共 72 条

[1]

[Anonymous], 2015, ACS SYM SER

[2] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[3] YOLACT Real-time Instance Segmentation [J].

Bolya, Daniel ;

Zhou, Chong ;

Xiao, Fanyi ;

Lee, Yong Jae .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9156-9165

[4] Multi attention module for visual tracking [J].

Chen, Boyu ;

Li, Peixia ;

Sun, Chong ;

Wang, Dong ;

Yang, Gang ;

Lu, Huchuan .

PATTERN RECOGNITION, 2019, 87 :80-93

[5] See-Through-Text Grouping for Referring Image Segmentation [J].

Chen, Ding-Jie ;

Jia, Songhao ;

Lo, Yi-Chen ;

Chen, Hwann-Tzong ;

Liu, Tyng-Luh .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7453-7462

[6] Hybrid Task Cascade for Instance Segmentation [J].

Chen, Kai ;

Pang, Jiangmiao ;

Wang, Jiaqi ;

Xiong, Yu ;

Li, Xiaoxiao ;

Sun, Shuyang ;

Feng, Wansen ;

Liu, Ziwei ;

Shi, Jianping ;

Ouyang, Wanli ;

Loy, Chen Change ;

Lin, Dahua .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4969-4978

[7] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[8]

Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709

[9] Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [J].

Chen, Xiaokang ;

Lin, Kwan-Yee ;

Wang, Jingbo ;

Wu, Wayne ;

Qian, Chen ;

Li, Hongsheng ;

Zeng, Gang .

COMPUTER VISION - ECCV 2020, PT XI, 2020, 12356 :561-577

[10]

Chen Y.-W., 2019, P BMVC, P1

← 1 2 3 4 5 6 7 8 →