Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

被引:9
|
作者
Vibashan, V. S. [1 ]
Yu, Ning [2 ]
Xing, Chen [2 ]
Qin, Can [3 ]
Gao, Mingfei [2 ]
Nieblest, Juan Carlos [2 ]
Patel, Vishal M. [1 ]
Xu, Ran [2 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Northeastern Univ, Boston, MA 02115 USA
[3] Salesforce Res, Hong Kong, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
D O I
10.1109/CVPR52729.2023.02254
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing instance segmentation models learn task-specific information using manual mask annotations from base (training) categories. These mask annotations require tremendous human effort, limiting the scalability to annotate novel (new) categories. To alleviate this problem, Open-Vocabulary (OV) methods leverage large-scale image-caption pairs and vision-language models to learn novel categories. In summary, an OV method learns task-specific information using strong supervision from base annotations and novel category information using weak supervision from image-captions pairs. This difference between strong and weak supervision leads to overfitting on base categories, resulting in poor generalization towards novel categories. In this work, we overcome this issue by learning both base and novel categories from pseudomask annotations generated by the vision-language model in a weakly supervised manner using our proposed Mask-free OVIS pipeline. Our method automatically generates pseudo-mask annotations by leveraging the localization ability of a pre-trained vision-language model for objects present in image-caption pairs. The generated pseudomask annotations are then used to supervise an instance segmentation model, freeing the entire pipeline from any labour-expensive instance-level annotations and overfitting. Our extensive experiments show that our method trained with just pseudo-masks significantly improves the mAP scores on the MS-COCO dataset and OpenImages dataset compared to the recent state-of-the-art methods trained with manual masks. Codes and models are provided in https://vibashan.github.io/ovis-web/.
引用
收藏
页码:23539 / 23549
页数:11
相关论文
共 28 条
  • [21] Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
    Barsellotti, Luca
    Amoroso, Roberto
    Cornia, Marcella
    Baraldi, Lorenzo
    Cucchiara, Rita
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 3689 - 3698
  • [22] Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation
    Ton, Tri
    Hong, Ji Woo
    Eom, SooHwan
    Shim, Jun Yeop
    Kim, Junyeong
    Yoo, Chang D.
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 7598 - 7607
  • [23] Mask-Free Patterning of High-Conductivity Metal Nanowires in Open Air by Spatially Modulated Femtosecond Laser Pulses
    Wang, Andong
    Jiang, Lan
    Li, Xiaowei
    Liu, Yang
    Dong, Xianzi
    Qu, Liangti
    Duan, Xuanming
    Lu, Yongfeng
    ADVANCED MATERIALS, 2015, 27 (40) : 6238 - 6243
  • [24] EventMASK: A Frame-Free Rapid Human Instance Segmentation With Event Camera Through Constrained Mask Propagation
    Annamalai, Lakshmi
    Ramanathan, Vignesh
    Thakur, Chetan Singh
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3948 - 3955
  • [25] OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
    Lu, Shiyang
    Chang, Haonan
    Jing, Eric Pu
    Boularias, Abdeslam
    Bekris, Kostas
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [26] APOVIS: Automated pixel-level open-vocabulary instance segmentation through integration of pre-trained vision-language models and foundational segmentation models
    Ma, Qiujie
    Yang, Shuqi
    Zhang, Lijuan
    Lan, Qing
    Yang, Dongdong
    Chen, Honghan
    Tan, Ying
    IMAGE AND VISION COMPUTING, 2025, 154
  • [27] KaryoXpert: An accurate chromosome segmentation and classification framework for karyotyping analysis without training with manually labeled metaphase-image mask annotations
    Chen S.
    Zhang K.
    Hu J.
    Li N.
    Xu A.
    Li H.
    Zhou J.
    Huang C.
    Yu Y.
    Gao X.
    Computers in Biology and Medicine, 2024, 177
  • [28] MA-SAM: A Multi-Atlas Guided SAM Using Pseudo Mask Prompts Without Manual Annotation for Spine Image Segmentation
    Fan, Dingwei
    Zhao, Junyong
    Li, Chunlin
    Wang, Xinlong
    Zhang, Ronghan
    Zhu, Qi
    Wang, Mingliang
    Si, Haipeng
    Zhang, Daoqiang
    Sun, Liang
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (05) : 2157 - 2169