Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation

被引:11
|
作者
Feng, Guang [1 ]
Hu, Zhiwei [1 ]
Zhang, Lihe [1 ]
Sun, Jiayu [1 ]
Lu, Huchuan [1 ]
机构
[1] Dalian Univ Technol, Sch Informat & Commun Engn, Dalian 116024, Peoples R China
基金
中国国家自然科学基金;
关键词
Image segmentation; Location awareness; Visualization; Task analysis; Linguistics; Semantics; Feature extraction; Language-guided visual attention; referring image localization and segmentation; segmentation-guided feature augmentation; vision-guided linguistic attention (VLAM);
D O I
10.1109/TNNLS.2021.3106153
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, referring image localization and segmentation has aroused widespread interest. However, the existing methods lack a clear description of the interdependence between language and vision. To this end, we present a bidirectional relationship inferring network (BRINet) to effectively address the challenging tasks. Specifically, we first employ a vision-guided linguistic attention module to perceive the keywords corresponding to each image region. Then, language-guided visual attention adopts the learned adaptive language to guide the update of the visual features. Together, they form a bidirectional cross-modal attention module (BCAM) to achieve the mutual guidance between language and vision. They can help the network align the cross-modal features better. Based on the vanilla language-guided visual attention, we further design an asymmetric language-guided visual attention, which significantly reduces the computational cost by modeling the relationship between each pixel and each pooled subregion. In addition, a segmentation-guided bottom-up augmentation module (SBAM) is utilized to selectively combine multilevel information flow for object localization. Experiments show that our method outperforms other state-of-the-art methods on three referring image localization datasets and four referring image segmentation datasets.
引用
收藏
页码:2246 / 2258
页数:13
相关论文
共 50 条
  • [41] CPFNet: Context Pyramid Fusion Network for Medical Image Segmentation
    Feng, Shuanglang
    Zhao, Heming
    Shi, Fei
    Cheng, Xuena
    Wang, Meng
    Ma, Yuhui
    Xiang, Dehui
    Zhu, Weifang
    Chen, Xinjian
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (10) : 3008 - 3018
  • [42] Deep Multimodal Fusion Network for Semantic Segmentation Using Remote Sensing Image and LiDAR Data
    Sun, Yangjie
    Fu, Zhongliang
    Sun, Chuanxia
    Hu, Yinglei
    Zhang, Shengyuan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [43] DPKI-Net: Dual Prior Knowledge Injection Network for Multitask 3-D Medical Image Segmentation and Landmark Localization
    Li, Xiang
    Li, Like
    Zhang, Kesheng
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [44] EAN: Edge-Aware Network for Image Manipulation Localization
    Chen, Yun
    Cheng, Hang
    Wang, Haichou
    Liu, Ximeng
    Chen, Fei
    Li, Fengyong
    Zhang, Xinpeng
    Wang, Meiqing
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1591 - 1601
  • [45] Bidirectional Feature Aggregation Network for Stereo Image Quality Assessment Considering Parallax Attention-Based Binocular Fusion
    Chang, Yongli
    Li, Sumei
    Liu, Anqi
    Zhang, Wenlin
    Jin, Jie
    Xiang, Wei
    IEEE TRANSACTIONS ON BROADCASTING, 2024, 70 (01) : 278 - 289
  • [46] Brain Image Segmentation for Ultrascale Neuron Reconstruction via an Adaptive Dual-Task Learning Network
    Liu, Min
    Wu, Shuhan
    Chen, Runze
    Lin, Zhuangdian
    Wang, Yaonan
    Meijering, Erik
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (07) : 2574 - 2586
  • [47] Visual Relationship Embedding Network for Image Paragraph Generation
    Che, Wenbin
    Fan, Xiaopeng
    Xiong, Ruiqin
    Zhao, Debin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (09) : 2307 - 2320
  • [48] Uncertainty-Aware Hierarchical Aggregation Network for Medical Image Segmentation
    Zhou, Tao
    Zhou, Yi
    Li, Guangyu
    Chen, Geng
    Shen, Jianbing
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7440 - 7453
  • [49] Medical Image Segmentation With Limited Supervision: A Review of Deep Network Models
    Peng, Jialin
    Wang, Ye
    IEEE ACCESS, 2021, 9 : 36827 - 36851
  • [50] PSRN: Polarimetric Space Reconstruction Network for PolSAR Image Semantic Segmentation
    Jing, Hao
    Wang, Zhirui
    Sun, Xian
    Xiao, Daifeng
    Fu, Kun
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 10716 - 10732