XMem plus plus : Production-level Video Segmentation From Few Annotated Frames

被引:4
作者
Bekuzarov, Maksym [1 ]
Bermudez, Ariana [1 ]
Lee, Joon-Young [3 ]
Li, Hao [1 ,2 ]
机构
[1] MBZUAI, Abu Dhabi, U Arab Emirates
[2] Pinscreen, Hangzhou, Peoples R China
[3] Adobe Res, San Francisco, CA USA
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年
关键词
D O I
10.1109/ICCV51070.2023.00065
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite advancements in user-guided video segmentation, extracting complex objects consistently for highly complex scenes is still a labor-intensive task, especially for production. It is not uncommon that a majority of frames need to be annotated. We introduce a novel semi-supervised video object segmentation (SSVOS) model, XMem++, that improves existing memory-based models, with a permanent memory module. Most existing methods focus on single frame annotations, while our approach can effectively handle multiple user-selected frames with varying appearances of the same object or region. Our method can extract highly consistent results while keeping the required number of frame annotations low. We further introduce an iterative and attention-based frame suggestion mechanism, which computes the next best frame for annotation. Our method is real-time and does not require retraining after each user input. We also introduce a new dataset, PUMaVOS, which covers new challenging use cases not found in previous benchmarks. We demonstrate SOTA performance on challenging (partial and multi-class) segmentation scenarios as well as long videos, while ensuring significantly fewer frame annotations than any existing method. Project page: https://max810.github.io/xmem2-project-page/
引用
收藏
页码:635 / 644
页数:10
相关论文
共 48 条
  • [1] Athar Ali, 2022, Burst: A benchmark for unifying object recognition, segmentation and tracking in video
  • [2] Bhat Goutam, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12347), P777, DOI 10.1007/978-3-030-58536-5_46
  • [3] One-Shot Video Object Segmentation
    Caelles, S.
    Maninis, K. -K.
    Pont-Tuset, J.
    Leal-Taixe, L.
    Cremers, D.
    Van Gool, L.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5320 - 5329
  • [4] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
    Cheng, Ho Kei
    Tai, Yu-Wing
    Tang, Chi-Keung
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5555 - 5564
  • [5] Cheng Ho Kei, 2022, Xmem: Longterm video object segmentation with an atkinson-shiffrin memory model
  • [6] Cheng Ho Kei, 2021, Advances in Neural Information Processing Systems, V34, P2
  • [7] Tackling Background Distraction in Video Object Segmentation
    Cho, Suhwan
    Lee, Heansung
    Lee, Minhyeok
    Park, Chaewon
    Jang, Sungjun
    Kim, Minjung
    Lee, Sangyoun
    [J]. COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 446 - 462
  • [8] Ding HH, 2023, Arxiv, DOI arXiv:2302.01872
  • [9] CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing
    Duarte, Kevin
    Rawat, Yogesh S.
    Shah, Mubarak
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8479 - 8488
  • [10] JumpCut: Non-Successive Mask Transfer and Interpolation for Video Cutout
    Fan, Qingnan
    Zhong, Fan
    Lischinski, Dani
    Cohen-Or, Daniel
    Chen, Baoquan
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (06):