Improving Video Segmentation via Dynamic Anchor Queries

被引:0
作者
Zhou, Yikang [1 ]
Zhang, Tao [1 ,2 ]
Ji, Shunping [1 ]
Yan, Shuicheng [2 ]
Li, Xiangtai [2 ]
机构
[1] Wuhan Univ, Wuhan, Peoples R China
[2] Skywork AI, Singapore, Singapore
来源
COMPUTER VISION - ECCV 2024, PT L | 2025年 / 15108卷
关键词
Video segmentation; Dynamic anchor design; Universal segmentation; TRACKING;
D O I
10.1007/978-3-031-72973-7_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern video segmentation methods adopt feature transitions between anchor and target queries to perform cross-frame object association. The smooth feature transitions between anchor and target queries enable these methods to achieve satisfactory performance when tracking continuously appearing objects. However, the emergence and disappearance of objects interrupt the smooth feature transition, and even widen this feature transition gap between anchor and target queries, which causes these methods to all underperform on newly emerging and disappearing objects that are common in the real world. We introduce Dynamic Anchor Queries (DAQ) to shorten the transition gap by dynamically generating anchor queries based on the features of potential newly emerging and disappearing candidates. Furthermore, we introduce a query-level object Emergence and Disappearance Simulation (EDS) strategy, which unleashes DAQ's potential without any additional cost. Finally, we combine our proposed DAQ and EDS with the previous method, DVIS, to obtain DVIS-DAQ. Extensive experiments demonstrate that DVIS-DAQ achieves a new state-of-the-art (SOTA) performance on five mainstream video segmentation benchmarks.
引用
收藏
页码:446 / 463
页数:18
相关论文
共 79 条
  • [11] Cheng B, 2021, ADV NEUR IN, V34
  • [12] Masked-attention Mask Transformer for Universal Image Segmentation
    Cheng, Bowen
    Misra, Ishan
    Schwing, Alexander G.
    Kirillov, Alexander
    Girdhar, Rohit
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1280 - 1289
  • [13] Cheng HK, 2021, ADV NEUR IN, V34
  • [14] Tracking Anything with Decoupled Video Segmentation
    Cheng, Ho Kei
    Oh, Seoung Wug
    Price, Brian
    Schwing, Alexander
    Lee, Joon-Young
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1316 - 1326
  • [15] MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
    Ding, Henghui
    Liu, Chang
    He, Shuting
    Jiang, Xudong
    Torr, Philip H. S.
    Bai, Song
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20167 - 20177
  • [16] Dosovitskiy A., 2020, PROC ICLR, P1
  • [17] MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking
    Gao, Ruopeng
    Wang, Limin
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9867 - 9876
  • [18] Hannan T, 2023, Arxiv, DOI arXiv:2305.17096
  • [19] He J, 2024, Arxiv, DOI arXiv:2311.18537
  • [20] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778