Open-Vocabulary And Multitask Image Segmentation

被引：0

作者：

Pan, Lihu ^{[1
]}

Yang, Yunting ^{[1
]}

Wang, Zhengkui ^{[2
]}

Shan, Wen ^{[3
]}

Yin, Jaili ^{[1
]}

机构：

[1] Taiyuan Univ Sci & Technol, Taiyuan, Peoples R China

[2] Singapore Inst Technol, Infocomm Technol Cluster, Singapore, Singapore

[3] Singapore Univ Social Sci, Singapore, Singapore

来源：

39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024 | 2024年

关键词：

Image segmentation; Adaptive prompt learning; Image-text fusion; Multitask;

D O I：

10.1145/3605098.3636192

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Open-vocabulary learning has revolutionized image segmentation, enabling the delineation of arbitrary categories from textual descriptions. While current methods often employ specialized architectures, OVAMTSeg presents a unified framework for Open-Vocabulary and Multitask Image Segmentation. Leveraging adaptive prompt learning, OVAMTSeg excels in capturing category-sensitive concepts, ensuring robustness across diverse multi-task scenarios. Text prompts effectively capture semantic and contextual features, while cross-attention and cross-modal interactions facilitate seamless fusion of image and text features. The framework incorporates a transformer-based decoder for dense prediction. Experimental results demonstrate OVAMTSeg's effectiveness, achieving a 47.5 mIoU in referring expression segmentation, 51.6 mIoU on Pascal-VOC with four unseen classes, 46.6 mIoU on Pascal-Context in zero-shot segmentation, 65.9 mIoU on Pascal-5i, and 35.7 mIoU on COCO-20i datasets for one-shot segmentation.

引用

页码：1048 / 1049

页数：2

共 4 条

[1] Dumoulin V., 2018, Distill, V3, DOI DOI 10.23915/DISTILL.00011
[2] Image Segmentation Using Text and Image Prompts
Lueddecke, Timo
Ecker, Alexander
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 7076 - 7086
[3] Radford A, 2021, PR MACH LEARN RES, V139
[4] U-Net: Convolutional Networks for Biomedical Image Segmentation
Ronneberger, Olaf
Fischer, Philipp
Brox, Thomas
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 : 234 - 241

← 1 →