Open-Vocabulary And Multitask Image Segmentation

被引:0
作者
Pan, Lihu [1 ]
Yang, Yunting [1 ]
Wang, Zhengkui [2 ]
Shan, Wen [3 ]
Yin, Jaili [1 ]
机构
[1] Taiyuan Univ Sci & Technol, Taiyuan, Peoples R China
[2] Singapore Inst Technol, Infocomm Technol Cluster, Singapore, Singapore
[3] Singapore Univ Social Sci, Singapore, Singapore
来源
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024 | 2024年
关键词
Image segmentation; Adaptive prompt learning; Image-text fusion; Multitask;
D O I
10.1145/3605098.3636192
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Open-vocabulary learning has revolutionized image segmentation, enabling the delineation of arbitrary categories from textual descriptions. While current methods often employ specialized architectures, OVAMTSeg presents a unified framework for Open-Vocabulary and Multitask Image Segmentation. Leveraging adaptive prompt learning, OVAMTSeg excels in capturing category-sensitive concepts, ensuring robustness across diverse multi-task scenarios. Text prompts effectively capture semantic and contextual features, while cross-attention and cross-modal interactions facilitate seamless fusion of image and text features. The framework incorporates a transformer-based decoder for dense prediction. Experimental results demonstrate OVAMTSeg's effectiveness, achieving a 47.5 mIoU in referring expression segmentation, 51.6 mIoU on Pascal-VOC with four unseen classes, 46.6 mIoU on Pascal-Context in zero-shot segmentation, 65.9 mIoU on Pascal-5i, and 35.7 mIoU on COCO-20i datasets for one-shot segmentation.
引用
收藏
页码:1048 / 1049
页数:2
相关论文
共 4 条
  • [1] Dumoulin V., 2018, Distill, V3, DOI DOI 10.23915/DISTILL.00011
  • [2] Image Segmentation Using Text and Image Prompts
    Lueddecke, Timo
    Ecker, Alexander
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 7076 - 7086
  • [3] Radford A, 2021, PR MACH LEARN RES, V139
  • [4] U-Net: Convolutional Networks for Biomedical Image Segmentation
    Ronneberger, Olaf
    Fischer, Philipp
    Brox, Thomas
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 : 234 - 241