Mask-Free Video Instance Segmentation

被引：9

作者：

Ke, Lei ^{[1
,2
]}

Danelljan, Martin ^{[1
]}

Ding, Henghui ^{[1
]}

Tai, Yu-Wing ^{[2
]}

Tang, Chi-Keung ^{[2
]}

Yu, Fisher ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Zurich, Switzerland

[2] HKUST, Hong Kong, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.02189

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The recent advancement in Video Instance Segmentation (VIS) has largely been driven by the use of deeper and increasingly data-hungry transformer-based models. However, video masks are tedious and expensive to annotate, limiting the scale and diversity of existing VIS datasets. In this work, we aim to remove the mask-annotation requirement. We propose MaskFreeVIS, achieving highly competitive VIS performance, while only using bounding box annotations for the object state. We leverage the rich temporal mask consistency constraints in videos by introducing the Temporal KNN-patch Loss (TK-Loss), providing strong mask supervision without any labels. Our TK-Loss finds one-to-many matches across frames, through an efficient patch-matching step followed by a K-nearest neighbor selection. A consistency loss is then enforced on the found matches. Our mask-free objective is simple to implement, has no trainable parameters, is computationally efficient, yet outperforms baselines employing, e.g., state-of-the-art optical flow to enforce temporal mask consistency. We validate MaskFreeVIS on the YouTube-VIS 2019/2021, OVIS and BDD100K MOTS benchmarks. The results clearly demonstrate the efficacy of our method by drastically narrowing the gap between fully and weakly-supervised VIS performance. Our code and trained models are available at http://vis.xyz/pub/maskfreevis.

引用

页码：22857 / 22866

页数：10

共 50 条

[1] Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations
Vibashan, V. S.
Yu, Ning
Xing, Chen
Qin, Can
Gao, Mingfei
Nieblest, Juan Carlos
Patel, Vishal M.
Xu, Ran
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23539 - 23549
[2] VIDEO PROJECTOR HAS MASK-FREE TUBE
YEAPLE, F
DESIGN NEWS, 1985, 41 (05) : 106 - 107
[3] MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model
Zhang, Zhenghao
Zhang, Shengfan
Dai, Zuozhuo
Dong, Zilong
Zhu, Siyu
PATTERN RECOGNITION, 2025, 159
[4] Video Mask Transfiner for High-Quality Video Instance Segmentation
Ke, Lei
Ding, Henghui
Danelljan, Martin
Tai, Yu-Wing
Tang, Chi-Keung
Yu, Fisher
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 731 - 747
[5] Video Instance Segmentation Without Using Mask and Identity Supervision
Li, Ge
Cao, Jiale
Sun, Hanqing
Anwer, Rao Muhammad
Xie, Jin
Khan, Fahad
Pang, Yanwei
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 224 - 235
[6] Mask generation dynamically regulates weakly supervised video instance segmentation
He Z.
Xu L.
Zhang Y.
Huang Y.
Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2023, 31 (19): : 2884 - 2897
[7] Video Instance Segmentation
Yang, Linjie
Fan, Yuchen
Xu, Ning
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5187 - 5196
[8] Dual Mask Branches for Instance Segmentation
Zhang, Xiaoliang
Liu, Yuankun
Li, Mao
Zhang, Yuqing
Chen, Changfeng
Yin, Jie
Zhou, Xiantao
PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, ICDSP 2024, 2024, : 39 - 46
[9] Adapting Video Instance Segmentation for Instance Search
Nguyen, An Thi
20TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2023, 2023, : 256 - 260
[10] Video Instance Segmentation by Instance Flow Assembly
Li, Xiang
Wang, Jinglu
Li, Xiao
Lu, Yan
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7469 - 7479

← 1 2 3 4 5 →