Improving Video Segmentation via Dynamic Anchor Queries

被引:0
作者
Zhou, Yikang [1 ]
Zhang, Tao [1 ,2 ]
Ji, Shunping [1 ]
Yan, Shuicheng [2 ]
Li, Xiangtai [2 ]
机构
[1] Wuhan Univ, Wuhan, Peoples R China
[2] Skywork AI, Singapore, Singapore
来源
COMPUTER VISION - ECCV 2024, PT L | 2025年 / 15108卷
关键词
Video segmentation; Dynamic anchor design; Universal segmentation; TRACKING;
D O I
10.1007/978-3-031-72973-7_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern video segmentation methods adopt feature transitions between anchor and target queries to perform cross-frame object association. The smooth feature transitions between anchor and target queries enable these methods to achieve satisfactory performance when tracking continuously appearing objects. However, the emergence and disappearance of objects interrupt the smooth feature transition, and even widen this feature transition gap between anchor and target queries, which causes these methods to all underperform on newly emerging and disappearing objects that are common in the real world. We introduce Dynamic Anchor Queries (DAQ) to shorten the transition gap by dynamically generating anchor queries based on the features of potential newly emerging and disappearing candidates. Furthermore, we introduce a query-level object Emergence and Disappearance Simulation (EDS) strategy, which unleashes DAQ's potential without any additional cost. Finally, we combine our proposed DAQ and EDS with the previous method, DVIS, to obtain DVIS-DAQ. Extensive experiments demonstrate that DVIS-DAQ achieves a new state-of-the-art (SOTA) performance on five mainstream video segmentation benchmarks.
引用
收藏
页码:446 / 463
页数:18
相关论文
共 79 条
  • [1] Athar Ali, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12356), P158, DOI 10.1007/978-3-030-58621-8_10
  • [2] TarViS: A Unified Approach for Target-based Video Segmentation
    Athar, Ali
    Hermans, Alexander
    Luiten, Jonathon
    Ramanan, Deva
    Leibe, Bastian
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18738 - 18748
  • [3] XMem plus plus : Production-level Video Segmentation From Few Annotated Frames
    Bekuzarov, Maksym
    Bermudez, Ariana
    Lee, Joon-Young
    Li, Hao
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 635 - 644
  • [4] Bertasius G, 2021, PR MACH LEARN RES, V139
  • [5] Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation
    Bertasius, Gedas
    Torresani, Lorenzo
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 9736 - 9745
  • [6] Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003
  • [7] One-Shot Video Object Segmentation
    Caelles, S.
    Maninis, K. -K.
    Pont-Tuset, J.
    Leal-Taixe, L.
    Cremers, D.
    Van Gool, L.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5320 - 5329
  • [8] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [9] Chen Zhe, 2022, Vision transformer adapter for dense predictions, DOI arXiv:2205.08534
  • [10] Cheng B., 2021, arXiv