PRNet: A Progressive Refinement Network for referring image segmentation

被引:0
作者
Liu, Jing [1 ]
Jiang, Huajie [1 ]
Hu, Yongli [1 ]
Yin, Baocai [1 ]
机构
[1] Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, 100 Pingleyuan, Beijing 100124, Peoples R China
基金
北京市自然科学基金; 国家重点研发计划; 中国国家自然科学基金;
关键词
Referring image segmentation; Position prior; Features alignment; Progressive localization; Transformer;
D O I
10.1016/j.neucom.2025.129698
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The effective feature alignment between language and image is necessary for correctly inferring the location of reference instances in the referring image segmentation (RIS) task. Previous studies usually resort to assisting target localization with the help of external detectors or using a coarse-grained positional prior during multimodal feature fusion to implicitly enhance the modal alignment capability. However, these approaches are either limited by the performance of the external detector and the design of the matching algorithm, or ignore the fine-grained features in the reference information when using the coarse-grained prior processing, which may lead to inaccurate segmentation results. In this paper, we propose anew RIS network, Progressive Refinement Network (PRNet), which aims to gradually improve the alignment quality between language and image from coarse to fine. The core of the PRNet is the Progressive Refinement Localization Scheme (PRLS), which consists of a Coarse Positional Prior Module (CPPM) and a Refined Localization Module (RLM). The CPPM obtains rough prior positional information and corresponding semantic features by calculating the similarity matrix between sentence and image. The RLM fuses information from the visual and language modalities by densely aligning pixels with word features and utilizes the prior positional information generated by the CPPM to enhance the textual semantic understanding, thus guiding the model to perceive the position of the reference instance more accurately. Experimental results show that the proposed PRNet performs well on all three public datasets, RefCOCO, RefCOCO+, and RefCOCOg.
引用
收藏
页数:12
相关论文
共 62 条
[51]   Object detection and recognition via clustered features [J].
Wozniak, Marcin ;
Polap, Dawid .
NEUROCOMPUTING, 2018, 320 :76-84
[52]  
Yang Z, 2023, AAAI CONF ARTIF INTE, P3222
[53]   LAVT: Language-Aware Vision Transformer for Referring Image Segmentation [J].
Yang, Zhao ;
Wang, Jiaqi ;
Tang, Yansong ;
Chen, Kai ;
Zhao, Hengshuang ;
Torr, Philip H. S. .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :18134-18144
[54]   Cross-Modal Self-Attention Network for Referring Image Segmentation [J].
Ye, Linwei ;
Rochan, Mrigank ;
Liu, Zhi ;
Wang, Yang .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10494-10503
[55]   MAttNet: Modular Attention Network for Referring Expression Comprehension [J].
Yu, Licheng ;
Lin, Zhe ;
Shen, Xiaohui ;
Yang, Jimei ;
Lu, Xin ;
Bansal, Mohit ;
Berg, Tamara L. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1307-1315
[56]   Domain Adaptation of Anchor-Free object detection for urban traffic [J].
Yu, Xiaoyong ;
Lu, Xiaoqiang .
NEUROCOMPUTING, 2024, 582
[57]  
Zhang DW, 2024, Arxiv, DOI arXiv:2410.03987
[58]   Area-keywords cross-modal alignment for referring image segmentation [J].
Zhang, Huiyong ;
Wang, Lichun ;
Li, Shuang ;
Xu, Kai ;
Yin, Baocai .
NEUROCOMPUTING, 2024, 581
[59]   LCU-Net: A novel low-cost U-Net for environmental microorganism image segmentation * [J].
Zhang, Jinghua ;
Li, Chen ;
Kosov, Sergey ;
Grzegorzek, Marcin ;
Shirahama, Kimiaki ;
Jiang, Tao ;
Sun, Changhao ;
Li, Zihan ;
Li, Hong .
PATTERN RECOGNITION, 2021, 115
[60]   VinVL: Revisiting Visual Representations in Vision-Language Models [J].
Zhang, Pengchuan ;
Li, Xiujun ;
Hu, Xiaowei ;
Yang, Jianwei ;
Zhang, Lei ;
Wang, Lijuan ;
Choi, Yejin ;
Gao, Jianfeng .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5575-5584