VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

被引:1
|
作者
Wang, Xudong [1 ]
Misra, Ishan
Zeng, Ziyun
Girdhar, Rohit
Darrell, Trevor
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
来源
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年
关键词
D O I
10.1109/CVPR52733.2024.02147
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing approaches to unsupervised video instance segmentation typically rely on motion estimates and experience difficulties tracking small or divergent motions. We present VideoCutLER, a simple method for unsupervised multi-instance video segmentation without using motion-based learning signals like optical flow or training on natural videos. Our key insight is that using high-quality pseudo masks and a simple video synthesis method for model training is surprisingly sufficient to enable the resulting video model to effectively segment and track multiple instances across video frames. We show the first competitive unsupervised learning results on the challenging YouTubeVIS-2019 benchmark, achieving 50.7% AP(50)(video), surpassing the previous state-of-the-art by a large margin. VideoCutLER can also serve as a strong pretrained model for supervised video instance segmentation tasks, exceeding DINO by 15.9% on YouTubeVIS-2019 in terms of AP(video).
引用
收藏
页码:22755 / 22764
页数:10
相关论文
共 50 条
  • [21] In Defense of Online Models for Video Instance Segmentation
    Wu, Junfeng
    Liu, Qihao
    Jiang, Yi
    Bai, Song
    Yuille, Alan
    Bai, Xiang
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 588 - 605
  • [22] SeqFormer: Sequential Transformer for Video Instance Segmentation
    Wu, Junfeng
    Jiang, Yi
    Bai, Song
    Zhang, Wenqing
    Bai, Xiang
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 553 - 569
  • [23] MaskRNN: Instance Level Video Object Segmentation
    Hu, Yuan-Ting
    Huang, Jia-Bin
    Schwing, Alexander G.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [24] Unsupervised video segmentation and object tracking
    Sista, S
    Kashyap, RL
    COMPUTERS IN INDUSTRY, 2000, 42 (2-3) : 127 - 146
  • [25] DVIS: Decoupled Video Instance Segmentation Framework
    Zhang, Tao
    Tian, Xingye
    Wu, Yu
    Ji, Shunping
    Wang, Xuebo
    Zhang, Yuan
    Wan, Pengfei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1282 - 1291
  • [26] Dense Unsupervised Learning for Video Segmentation
    Araslanov, Nikita
    Schaub-Meyer, Simone
    Roth, Stefan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [27] Role of prefiltering in unsupervised video segmentation
    Karaca, HM
    Anarim, E
    Morgül, A
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1999 - 2002
  • [28] InstanceFormer: An Online Video Instance Segmentation Framework
    Ludwig Maximilian University of Munich, Germany
    不详
    arXiv, 1600,
  • [29] Mask-Free Video Instance Segmentation
    Ke, Lei
    Danelljan, Martin
    Ding, Henghui
    Tai, Yu-Wing
    Tang, Chi-Keung
    Yu, Fisher
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22857 - 22866
  • [30] Learning Hierarchical Embeddings for Video Instance Segmentation
    Qin, Zheyun
    Lu, Xiankai
    Nie, Xiushan
    Zhen, Xiantong
    Yin, Yilong
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1884 - 1892