Mask-Free Video Instance Segmentation

被引:9
|
作者
Ke, Lei [1 ,2 ]
Danelljan, Martin [1 ]
Ding, Henghui [1 ]
Tai, Yu-Wing [2 ]
Tang, Chi-Keung [2 ]
Yu, Fisher [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] HKUST, Hong Kong, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
D O I
10.1109/CVPR52729.2023.02189
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent advancement in Video Instance Segmentation (VIS) has largely been driven by the use of deeper and increasingly data-hungry transformer-based models. However, video masks are tedious and expensive to annotate, limiting the scale and diversity of existing VIS datasets. In this work, we aim to remove the mask-annotation requirement. We propose MaskFreeVIS, achieving highly competitive VIS performance, while only using bounding box annotations for the object state. We leverage the rich temporal mask consistency constraints in videos by introducing the Temporal KNN-patch Loss (TK-Loss), providing strong mask supervision without any labels. Our TK-Loss finds one-to-many matches across frames, through an efficient patch-matching step followed by a K-nearest neighbor selection. A consistency loss is then enforced on the found matches. Our mask-free objective is simple to implement, has no trainable parameters, is computationally efficient, yet outperforms baselines employing, e.g., state-of-the-art optical flow to enforce temporal mask consistency. We validate MaskFreeVIS on the YouTube-VIS 2019/2021, OVIS and BDD100K MOTS benchmarks. The results clearly demonstrate the efficacy of our method by drastically narrowing the gap between fully and weakly-supervised VIS performance. Our code and trained models are available at http://vis.xyz/pub/maskfreevis.
引用
收藏
页码:22857 / 22866
页数:10
相关论文
共 50 条
  • [1] Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations
    Vibashan, V. S.
    Yu, Ning
    Xing, Chen
    Qin, Can
    Gao, Mingfei
    Nieblest, Juan Carlos
    Patel, Vishal M.
    Xu, Ran
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23539 - 23549
  • [2] VIDEO PROJECTOR HAS MASK-FREE TUBE
    YEAPLE, F
    DESIGN NEWS, 1985, 41 (05) : 106 - 107
  • [3] MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model
    Zhang, Zhenghao
    Zhang, Shengfan
    Dai, Zuozhuo
    Dong, Zilong
    Zhu, Siyu
    PATTERN RECOGNITION, 2025, 159
  • [4] Video Mask Transfiner for High-Quality Video Instance Segmentation
    Ke, Lei
    Ding, Henghui
    Danelljan, Martin
    Tai, Yu-Wing
    Tang, Chi-Keung
    Yu, Fisher
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 731 - 747
  • [5] Video Instance Segmentation Without Using Mask and Identity Supervision
    Li, Ge
    Cao, Jiale
    Sun, Hanqing
    Anwer, Rao Muhammad
    Xie, Jin
    Khan, Fahad
    Pang, Yanwei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 224 - 235
  • [6] Mask generation dynamically regulates weakly supervised video instance segmentation
    He Z.
    Xu L.
    Zhang Y.
    Huang Y.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2023, 31 (19): : 2884 - 2897
  • [7] Video Instance Segmentation
    Yang, Linjie
    Fan, Yuchen
    Xu, Ning
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5187 - 5196
  • [8] Dual Mask Branches for Instance Segmentation
    Zhang, Xiaoliang
    Liu, Yuankun
    Li, Mao
    Zhang, Yuqing
    Chen, Changfeng
    Yin, Jie
    Zhou, Xiantao
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, ICDSP 2024, 2024, : 39 - 46
  • [9] Adapting Video Instance Segmentation for Instance Search
    Nguyen, An Thi
    20TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2023, 2023, : 256 - 260
  • [10] Video Instance Segmentation by Instance Flow Assembly
    Li, Xiang
    Wang, Jinglu
    Li, Xiao
    Lu, Yan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7469 - 7479