AIUnet: Asymptotic inference with U2-Net for referring image segmentation

被引:2
作者
Li, Jiangquan [1 ]
Shan, Shimin [1 ]
Liu, Yu [1 ]
Xu, Kaiping [1 ]
Hu, Xiwen [1 ]
Xue, Mingcheng [1 ]
机构
[1] Dalian Univ Technol, Dalian, Peoples R China
来源
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023 | 2023年
关键词
Multimodal and crossmodal learning; Multimodal fusion; Human-robot/agent interaction;
D O I
10.1145/3577190.3614176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation aims to segment a target object from an image by providing a natural language expression. While recent methods have made remarkable advancements, few have designed effective deep fusion processes for cross-model features or focused on the fine details of vision. In this paper, we propose AIUnet, an asymptotic inference method that uses U2-Net. The core of AIUnet is a Cross-model U2-Net (CMU) module, which integrates a Text guide vision (TGV) module into U2-Net, achieving efficient interaction of cross-model information at different scales. CMU focuses more on location information in high-level features and learns finer detail information in low-level features. Additionally, we propose a Features Enhance Decoder (FED) module to improve the recognition of fine details and decode cross-model features to binary masks. The FED module leverages a simple CNN-based approach to enhance multi-modal features. Our experiments show that AIUnet achieved competitive results on three standard datasets.Code is available at https://github.com/LJQbiu/AIUnet.
引用
收藏
页码:24 / 32
页数:9
相关论文
共 37 条
[1]   Dense-UNet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network [J].
Cai, Sijing ;
Tian, Yunxian ;
Lui, Harvey ;
Zeng, Haishan ;
Wu, Yi ;
Chen, Guannan .
QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2020, 10 (06) :1275-1285
[2]   See-Through-Text Grouping for Referring Image Segmentation [J].
Chen, Ding-Jie ;
Jia, Songhao ;
Lo, Yi-Chen ;
Chen, Hwann-Tzong ;
Liu, Tyng-Luh .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7453-7462
[3]  
Chen J., 2021, arXiv
[4]  
Chen YW, 2019, Arxiv, DOI arXiv:1910.04748
[5]   Vision-Language Transformer and Query Generation for Referring Segmentation [J].
Ding, Henghui ;
Liu, Chang ;
Wang, Suchen ;
Jiang, Xudong .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :16301-16310
[6]   Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation [J].
Ding, Zihan ;
Hui, Tianrui ;
Huang, Junshi ;
Wei, Xiaoming ;
Han, Jizhong ;
Liu, Si .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :4954-4963
[7]   Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation [J].
Feng, Guang ;
Hu, Zhiwei ;
Zhang, Lihe ;
Lu, Huchuan .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15501-15510
[8]   Segmentation from Natural Language Expressions [J].
Hu, Ronghang ;
Rohrbach, Marcus ;
Darrell, Trevor .
COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :108-124
[9]   Bi-directional Relationship Inferring Network for Referring Image Segmentation [J].
Hu, Zhiwei ;
Feng, Guang ;
Sun, Jiayu ;
Zhang, Lihe ;
Lu, Huchuan .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4423-4432
[10]   Referring Image Segmentation via Cross-Modal Progressive Comprehension [J].
Huang, Shaofei ;
Hui, Tianrui ;
Liu, Si ;
Li, Guanbin ;
Wei, Yunchao ;
Han, Jizhong ;
Liu, Luoqi ;
Li, Bo .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10485-10494