Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

被引：9

作者：

Vibashan, V. S. ^{[1
]}

Yu, Ning ^{[2
]}

Xing, Chen ^{[2
]}

Qin, Can ^{[3
]}

Gao, Mingfei ^{[2
]}

Nieblest, Juan Carlos ^{[2
]}

Patel, Vishal M. ^{[1
]}

Xu, Ran ^{[2
]}

机构：

[1] Johns Hopkins Univ, Baltimore, MD 21218 USA

[2] Northeastern Univ, Boston, MA 02115 USA

[3] Salesforce Res, Hong Kong, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.02254

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing instance segmentation models learn task-specific information using manual mask annotations from base (training) categories. These mask annotations require tremendous human effort, limiting the scalability to annotate novel (new) categories. To alleviate this problem, Open-Vocabulary (OV) methods leverage large-scale image-caption pairs and vision-language models to learn novel categories. In summary, an OV method learns task-specific information using strong supervision from base annotations and novel category information using weak supervision from image-captions pairs. This difference between strong and weak supervision leads to overfitting on base categories, resulting in poor generalization towards novel categories. In this work, we overcome this issue by learning both base and novel categories from pseudomask annotations generated by the vision-language model in a weakly supervised manner using our proposed Mask-free OVIS pipeline. Our method automatically generates pseudo-mask annotations by leveraging the localization ability of a pre-trained vision-language model for objects present in image-caption pairs. The generated pseudomask annotations are then used to supervise an instance segmentation model, freeing the entire pipeline from any labour-expensive instance-level annotations and overfitting. Our extensive experiments show that our method trained with just pseudo-masks significantly improves the mAP scores on the MS-COCO dataset and OpenImages dataset compared to the recent state-of-the-art methods trained with manual masks. Codes and models are provided in https://vibashan.github.io/ovis-web/.

引用

页码：23539 / 23549

页数：11

共 28 条

[21] Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
Barsellotti, Luca
Amoroso, Roberto
Cornia, Marcella
Baraldi, Lorenzo
Cucchiara, Rita
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 3689 - 3698
[22] Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation
Ton, Tri
Hong, Ji Woo
Eom, SooHwan
Shim, Jun Yeop
Kim, Junyeong
Yoo, Chang D.
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 7598 - 7607
[23] Mask-Free Patterning of High-Conductivity Metal Nanowires in Open Air by Spatially Modulated Femtosecond Laser Pulses
Wang, Andong
Jiang, Lan
Li, Xiaowei
Liu, Yang
Dong, Xianzi
Qu, Liangti
Duan, Xuanming
Lu, Yongfeng
ADVANCED MATERIALS, 2015, 27 (40) : 6238 - 6243
[24] EventMASK: A Frame-Free Rapid Human Instance Segmentation With Event Camera Through Constrained Mask Propagation
Annamalai, Lakshmi
Ramanathan, Vignesh
Thakur, Chetan Singh
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3948 - 3955
[25] OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
Lu, Shiyang
Chang, Haonan
Jing, Eric Pu
Boularias, Abdeslam
Bekris, Kostas
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[26] APOVIS: Automated pixel-level open-vocabulary instance segmentation through integration of pre-trained vision-language models and foundational segmentation models
Ma, Qiujie
Yang, Shuqi
Zhang, Lijuan
Lan, Qing
Yang, Dongdong
Chen, Honghan
Tan, Ying
IMAGE AND VISION COMPUTING, 2025, 154
[27] KaryoXpert: An accurate chromosome segmentation and classification framework for karyotyping analysis without training with manually labeled metaphase-image mask annotations
Chen S.
Zhang K.
Hu J.
Li N.
Xu A.
Li H.
Zhou J.
Huang C.
Yu Y.
Gao X.
Computers in Biology and Medicine, 2024, 177
[28] MA-SAM: A Multi-Atlas Guided SAM Using Pseudo Mask Prompts Without Manual Annotation for Spine Image Segmentation
Fan, Dingwei
Zhao, Junyong
Li, Chunlin
Wang, Xinlong
Zhang, Ronghan
Zhu, Qi
Wang, Mingliang
Si, Haipeng
Zhang, Daoqiang
Sun, Liang
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (05) : 2157 - 2169

← 1 2 3 →