Mask-Free Video Instance Segmentation

被引:16
作者
Ke, Lei [1 ,2 ]
Danelljan, Martin [1 ]
Ding, Henghui [1 ]
Tai, Yu-Wing [2 ]
Tang, Chi-Keung [2 ]
Yu, Fisher [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] HKUST, Hong Kong, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
D O I
10.1109/CVPR52729.2023.02189
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent advancement in Video Instance Segmentation (VIS) has largely been driven by the use of deeper and increasingly data-hungry transformer-based models. However, video masks are tedious and expensive to annotate, limiting the scale and diversity of existing VIS datasets. In this work, we aim to remove the mask-annotation requirement. We propose MaskFreeVIS, achieving highly competitive VIS performance, while only using bounding box annotations for the object state. We leverage the rich temporal mask consistency constraints in videos by introducing the Temporal KNN-patch Loss (TK-Loss), providing strong mask supervision without any labels. Our TK-Loss finds one-to-many matches across frames, through an efficient patch-matching step followed by a K-nearest neighbor selection. A consistency loss is then enforced on the found matches. Our mask-free objective is simple to implement, has no trainable parameters, is computationally efficient, yet outperforms baselines employing, e.g., state-of-the-art optical flow to enforce temporal mask consistency. We validate MaskFreeVIS on the YouTube-VIS 2019/2021, OVIS and BDD100K MOTS benchmarks. The results clearly demonstrate the efficacy of our method by drastically narrowing the gap between fully and weakly-supervised VIS performance. Our code and trained models are available at http://vis.xyz/pub/maskfreevis.
引用
收藏
页码:22857 / 22866
页数:10
相关论文
共 50 条
[41]   GMT: Guided Mask Transformer for Leaf Instance Segmentation [J].
Chen, Feng ;
Tsaftaris, Sotirios A. ;
Giuffrida, Mario Valerio .
2025 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV, 2025, :1217-1226
[42]   Rethinking mask heads for partially supervised instance segmentation [J].
Zhao, Kai ;
Wang, Xuehui ;
Chen, Xingyu ;
Zhang, Ruixin ;
Shen, Wei .
NEUROCOMPUTING, 2022, 514 :426-434
[43]   DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation [J].
Shen, Xing ;
Yang, Jirui ;
Wei, Chunbo ;
Deng, Bing ;
Huang, Jianqiang ;
Hua, Xiansheng ;
Cheng, Xiaoliang ;
Liang, Kewei .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8716-8725
[44]   Mask-Free Production of Integratable Monolithic Micro Logarithmic Axicon Lenses [J].
Lin, Xiao-Feng ;
Chen, Qi-Dai ;
Niu, Li-Gang ;
Jiang, Tong ;
Wang, Wen-Quan ;
Sun, Hong-Bo .
JOURNAL OF LIGHTWAVE TECHNOLOGY, 2010, 28 (08) :1256-1260
[45]   UPVIS: upsampled video query for offline video instance segmentation [J].
Jo, Junho ;
Chung, Haesoo ;
Lee, Joon Seok ;
Wee, Dongyoon ;
Cho, Nam Ik .
Multimedia Tools and Applications, 2025, 84 (27) :32821-32840
[46]   Mask-free and programmable patterning of graphene by ultrafast laser direct writing [J].
Chen, Hao-Yan ;
Han, Dongdong ;
Tian, Ye ;
Shao, Ruiqiang ;
Wei, Shu .
CHEMICAL PHYSICS, 2014, 430 :13-17
[47]   TEM characterization of catalyst- and mask-free grown GaN nanorods [J].
Schowalter, M. ;
Aschenbrenner, T. ;
Kruse, C. ;
Hommel, D. ;
Rosenauer, A. .
16TH INTERNATIONAL CONFERENCE ON MICROSCOPY OF SEMICONDUCTING MATERIALS, 2010, 209
[48]   A structural investigation of highly ordered catalyst- and mask-free GaN nanorods [J].
Figge, S. ;
Aschenbrenner, T. ;
Kruse, C. ;
Kunert, G. ;
Schowalter, M. ;
Rosenauer, A. ;
Hommel, D. .
NANOTECHNOLOGY, 2011, 22 (02)
[49]   MASK-FREE RADIOTHERAPY DOSE PREDICTION VIA MULTI-TASK LEARNING [J].
Jiao, Zhengyang ;
Peng, Xingchen ;
Xiao, Jianghong ;
Wu, Xi ;
Zhou, Jiliu ;
Wang, Yan .
2022 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (IEEE ISBI 2022), 2022,
[50]   Learning Better Video Query With SAM for Video Instance Segmentation [J].
Fang, Hao ;
Zhang, Tong ;
Zhou, Xiaofei ;
Zhang, Xinxin .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) :2963-2974