Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation

被引：11

作者：

Feng, Guang ^{[1
]}

Hu, Zhiwei ^{[1
]}

Zhang, Lihe ^{[1
]}

Sun, Jiayu ^{[1
]}

Lu, Huchuan ^{[1
]}

机构：

[1] Dalian Univ Technol, Sch Informat & Commun Engn, Dalian 116024, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Image segmentation; Location awareness; Visualization; Task analysis; Linguistics; Semantics; Feature extraction; Language-guided visual attention; referring image localization and segmentation; segmentation-guided feature augmentation; vision-guided linguistic attention (VLAM);

D O I：

10.1109/TNNLS.2021.3106153

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, referring image localization and segmentation has aroused widespread interest. However, the existing methods lack a clear description of the interdependence between language and vision. To this end, we present a bidirectional relationship inferring network (BRINet) to effectively address the challenging tasks. Specifically, we first employ a vision-guided linguistic attention module to perceive the keywords corresponding to each image region. Then, language-guided visual attention adopts the learned adaptive language to guide the update of the visual features. Together, they form a bidirectional cross-modal attention module (BCAM) to achieve the mutual guidance between language and vision. They can help the network align the cross-modal features better. Based on the vanilla language-guided visual attention, we further design an asymmetric language-guided visual attention, which significantly reduces the computational cost by modeling the relationship between each pixel and each pooled subregion. In addition, a segmentation-guided bottom-up augmentation module (SBAM) is utilized to selectively combine multilevel information flow for object localization. Experiments show that our method outperforms other state-of-the-art methods on three referring image localization datasets and four referring image segmentation datasets.

引用

页码：2246 / 2258

页数：13

共 50 条

[31] Hierarchical Context Network for Airborne Image Segmentation
Zhou, Feng
Hang, Renlong
Shuai, Hui
Liu, Qingshan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[32] Cross-modal fusion encoder via graph neural network for referring image segmentation
Zhang, Yuqing
Zhang, Yong
Piao, Xinglin
Yuan, Peng
Hu, Yongli
Yin, Baocai
IET IMAGE PROCESSING, 2024, 18 (04) : 1083 - 1095
[33] SCA-Net: A Spatial and Channel Attention Network for Medical Image Segmentation
Shan, Tong
Yan, Jiayong
IEEE ACCESS, 2021, 9 (09): : 160926 - 160937
[34] Geographic Semantic Network for Cross-View Image Geo-Localization
Zhu, Yingying
Sun, Bin
Lu, Xiufan
Jia, Sen
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[35] Multi-Level Object-Aware Guidance Network for Biomedical Image Segmentation
Wu, Huisi
Zhang, Baiming
Pan, Junquan
Qin, Jing
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (03) : 2440 - 2453
[36] A Gather-to-Guide Network for Remote Sensing Semantic Segmentation of RGB and Auxiliary Image
Zheng, Xianwei
Wu, Xiujie
Huan, Linxi
He, Wei
Zhang, Hongyan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[37] WideSegNeXt: Semantic Image Segmentation Using Wide Residual Network and NeXt Dilated Unit
Nakayama, Yoshiki
Lu, Huimin
Li, Yujie
Kamiya, Tohru
IEEE SENSORS JOURNAL, 2021, 21 (10) : 11427 - 11434
[38] Dense Feature Interaction Network for Image Inpainting Localization
Yao, Ye
Han, Tingfeng
Jia, Shan
Lyu, Siwei
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 1636 - 1648
[39] PDLFBR-Net: Partial Decoder Localization and Foreground-Background Refinement Network for Polyp Segmentation
Peng, Yanbin
Feng, Mingkun
Zhai, Zhinian
Zheng, Zhijun
IEEE ACCESS, 2024, 12 : 114280 - 114294
[40] FTNet: Feature Transverse Network for Thermal Image Semantic Segmentation
Panetta, Karen
Kamath, K. M. Shreyas
Rajeev, Srijith
Agaian, Sos S.
IEEE ACCESS, 2021, 9 : 145212 - 145227

← 1 2 3 4 5 →