AIUnet: Asymptotic inference with U2-Net for referring image segmentation

被引：2

作者：

Li, Jiangquan ^{[1
]}

Shan, Shimin ^{[1
]}

Liu, Yu ^{[1
]}

Xu, Kaiping ^{[1
]}

Hu, Xiwen ^{[1
]}

Xue, Mingcheng ^{[1
]}

机构：

[1] Dalian Univ Technol, Dalian, Peoples R China

来源：

PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023 | 2023年

关键词：

Multimodal and crossmodal learning; Multimodal fusion; Human-robot/agent interaction;

D O I：

10.1145/3577190.3614176

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring image segmentation aims to segment a target object from an image by providing a natural language expression. While recent methods have made remarkable advancements, few have designed effective deep fusion processes for cross-model features or focused on the fine details of vision. In this paper, we propose AIUnet, an asymptotic inference method that uses U2-Net. The core of AIUnet is a Cross-model U2-Net (CMU) module, which integrates a Text guide vision (TGV) module into U2-Net, achieving efficient interaction of cross-model information at different scales. CMU focuses more on location information in high-level features and learns finer detail information in low-level features. Additionally, we propose a Features Enhance Decoder (FED) module to improve the recognition of fine details and decode cross-model features to binary masks. The FED module leverages a simple CNN-based approach to enhance multi-modal features. Our experiments show that AIUnet achieved competitive results on three standard datasets.Code is available at https://github.com/LJQbiu/AIUnet.

引用

页码：24 / 32

页数：9

共 37 条

[1] Dense-UNet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network [J].

Cai, Sijing ;

Tian, Yunxian ;

Lui, Harvey ;

Zeng, Haishan ;

Wu, Yi ;

Chen, Guannan .

QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2020, 10 (06) :1275-1285

[2] See-Through-Text Grouping for Referring Image Segmentation [J].

Chen, Ding-Jie ;

Jia, Songhao ;

Lo, Yi-Chen ;

Chen, Hwann-Tzong ;

Liu, Tyng-Luh .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7453-7462

[3]

Chen J., 2021, arXiv

[4]

Chen YW, 2019, Arxiv, DOI arXiv:1910.04748

[5] Vision-Language Transformer and Query Generation for Referring Segmentation [J].

Ding, Henghui ;

Liu, Chang ;

Wang, Suchen ;

Jiang, Xudong .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :16301-16310

[6] Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation [J].

Ding, Zihan ;

Hui, Tianrui ;

Huang, Junshi ;

Wei, Xiaoming ;

Han, Jizhong ;

Liu, Si .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :4954-4963

[7] Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation [J].

Feng, Guang ;

Hu, Zhiwei ;

Zhang, Lihe ;

Lu, Huchuan .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15501-15510

[8] Segmentation from Natural Language Expressions [J].

Hu, Ronghang ;

Rohrbach, Marcus ;

Darrell, Trevor .

COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :108-124

[9] Bi-directional Relationship Inferring Network for Referring Image Segmentation [J].

Hu, Zhiwei ;

Feng, Guang ;

Sun, Jiayu ;

Zhang, Lihe ;

Lu, Huchuan .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4423-4432

[10] Referring Image Segmentation via Cross-Modal Progressive Comprehension [J].

Huang, Shaofei ;

Hui, Tianrui ;

Liu, Si ;

Li, Guanbin ;

Wei, Yunchao ;

Han, Jizhong ;

Liu, Luoqi ;

Li, Bo .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10485-10494

← 1 2 3 4 →