XMem plus plus : Production-level Video Segmentation From Few Annotated Frames

被引：4

作者：

Bekuzarov, Maksym ^{[1
]}

Bermudez, Ariana ^{[1
]}

Lee, Joon-Young ^{[3
]}

Li, Hao ^{[1
,2
]}

机构：

[1] MBZUAI, Abu Dhabi, U Arab Emirates

[2] Pinscreen, Hangzhou, Peoples R China

[3] Adobe Res, San Francisco, CA USA

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年

关键词：

D O I：

10.1109/ICCV51070.2023.00065

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite advancements in user-guided video segmentation, extracting complex objects consistently for highly complex scenes is still a labor-intensive task, especially for production. It is not uncommon that a majority of frames need to be annotated. We introduce a novel semi-supervised video object segmentation (SSVOS) model, XMem++, that improves existing memory-based models, with a permanent memory module. Most existing methods focus on single frame annotations, while our approach can effectively handle multiple user-selected frames with varying appearances of the same object or region. Our method can extract highly consistent results while keeping the required number of frame annotations low. We further introduce an iterative and attention-based frame suggestion mechanism, which computes the next best frame for annotation. Our method is real-time and does not require retraining after each user input. We also introduce a new dataset, PUMaVOS, which covers new challenging use cases not found in previous benchmarks. We demonstrate SOTA performance on challenging (partial and multi-class) segmentation scenarios as well as long videos, while ensuring significantly fewer frame annotations than any existing method. Project page: https://max810.github.io/xmem2-project-page/

引用

页码：635 / 644

页数：10

共 48 条

[1] Athar Ali, 2022, Burst: A benchmark for unifying object recognition, segmentation and tracking in video
[2] Bhat Goutam, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12347), P777, DOI 10.1007/978-3-030-58536-5_46
[3] One-Shot Video Object Segmentation
Caelles, S.
Maninis, K. -K.
Pont-Tuset, J.
Leal-Taixe, L.
Cremers, D.
Van Gool, L.
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5320 - 5329
[4] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
Cheng, Ho Kei
Tai, Yu-Wing
Tang, Chi-Keung
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5555 - 5564
[5] Cheng Ho Kei, 2022, Xmem: Longterm video object segmentation with an atkinson-shiffrin memory model
[6] Cheng Ho Kei, 2021, Advances in Neural Information Processing Systems, V34, P2
[7] Tackling Background Distraction in Video Object Segmentation
Cho, Suhwan
Lee, Heansung
Lee, Minhyeok
Park, Chaewon
Jang, Sungjun
Kim, Minjung
Lee, Sangyoun
[J]. COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 446 - 462
[8] Ding HH, 2023, Arxiv, DOI arXiv:2302.01872
[9] CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing
Duarte, Kevin
Rawat, Yogesh S.
Shah, Mubarak
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8479 - 8488
[10] JumpCut: Non-Successive Mask Transfer and Interpolation for Video Cutout
Fan, Qingnan
Zhong, Fan
Lischinski, Dani
Cohen-Or, Daniel
Chen, Baoquan
[J]. ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (06):

← 1 2 3 4 5 →