VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

被引：1

作者：

Wang, Xudong ^{[1
]}

Misra, Ishan

Zeng, Ziyun

Girdhar, Rohit

Darrell, Trevor

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

关键词：

D O I：

10.1109/CVPR52733.2024.02147

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing approaches to unsupervised video instance segmentation typically rely on motion estimates and experience difficulties tracking small or divergent motions. We present VideoCutLER, a simple method for unsupervised multi-instance video segmentation without using motion-based learning signals like optical flow or training on natural videos. Our key insight is that using high-quality pseudo masks and a simple video synthesis method for model training is surprisingly sufficient to enable the resulting video model to effectively segment and track multiple instances across video frames. We show the first competitive unsupervised learning results on the challenging YouTubeVIS-2019 benchmark, achieving 50.7% AP(50)(video), surpassing the previous state-of-the-art by a large margin. VideoCutLER can also serve as a strong pretrained model for supervised video instance segmentation tasks, exceeding DINO by 15.9% on YouTubeVIS-2019 in terms of AP(video).

引用

页码：22755 / 22764

页数：10

共 50 条

[21] In Defense of Online Models for Video Instance Segmentation
Wu, Junfeng
Liu, Qihao
Jiang, Yi
Bai, Song
Yuille, Alan
Bai, Xiang
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 588 - 605
[22] SeqFormer: Sequential Transformer for Video Instance Segmentation
Wu, Junfeng
Jiang, Yi
Bai, Song
Zhang, Wenqing
Bai, Xiang
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 553 - 569
[23] MaskRNN: Instance Level Video Object Segmentation
Hu, Yuan-Ting
Huang, Jia-Bin
Schwing, Alexander G.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[24] Unsupervised video segmentation and object tracking
Sista, S
Kashyap, RL
COMPUTERS IN INDUSTRY, 2000, 42 (2-3) : 127 - 146
[25] DVIS: Decoupled Video Instance Segmentation Framework
Zhang, Tao
Tian, Xingye
Wu, Yu
Ji, Shunping
Wang, Xuebo
Zhang, Yuan
Wan, Pengfei
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1282 - 1291
[26] Dense Unsupervised Learning for Video Segmentation
Araslanov, Nikita
Schaub-Meyer, Simone
Roth, Stefan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[27] Role of prefiltering in unsupervised video segmentation
Karaca, HM
Anarim, E
Morgül, A
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1999 - 2002
[28] InstanceFormer: An Online Video Instance Segmentation Framework
Ludwig Maximilian University of Munich, Germany
不详
arXiv, 1600,
[29] Mask-Free Video Instance Segmentation
Ke, Lei
Danelljan, Martin
Ding, Henghui
Tai, Yu-Wing
Tang, Chi-Keung
Yu, Fisher
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22857 - 22866
[30] Learning Hierarchical Embeddings for Video Instance Segmentation
Qin, Zheyun
Lu, Xiankai
Nie, Xiushan
Zhen, Xiantong
Yin, Yilong
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1884 - 1892

← 1 2 3 4 5 →