Improving Video Segmentation via Dynamic Anchor Queries

被引：0

作者：

Zhou, Yikang ^{[1
]}

Zhang, Tao ^{[1
,2
]}

Ji, Shunping ^{[1
]}

Yan, Shuicheng ^{[2
]}

Li, Xiangtai ^{[2
]}

机构：

[1] Wuhan Univ, Wuhan, Peoples R China

[2] Skywork AI, Singapore, Singapore

来源：

COMPUTER VISION - ECCV 2024, PT L | 2025年 / 15108卷

关键词：

Video segmentation; Dynamic anchor design; Universal segmentation; TRACKING;

D O I：

10.1007/978-3-031-72973-7_26

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Modern video segmentation methods adopt feature transitions between anchor and target queries to perform cross-frame object association. The smooth feature transitions between anchor and target queries enable these methods to achieve satisfactory performance when tracking continuously appearing objects. However, the emergence and disappearance of objects interrupt the smooth feature transition, and even widen this feature transition gap between anchor and target queries, which causes these methods to all underperform on newly emerging and disappearing objects that are common in the real world. We introduce Dynamic Anchor Queries (DAQ) to shorten the transition gap by dynamically generating anchor queries based on the features of potential newly emerging and disappearing candidates. Furthermore, we introduce a query-level object Emergence and Disappearance Simulation (EDS) strategy, which unleashes DAQ's potential without any additional cost. Finally, we combine our proposed DAQ and EDS with the previous method, DVIS, to obtain DVIS-DAQ. Extensive experiments demonstrate that DVIS-DAQ achieves a new state-of-the-art (SOTA) performance on five mainstream video segmentation benchmarks.

引用

页码：446 / 463

页数：18

共 79 条

[1] Athar Ali, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12356), P158, DOI 10.1007/978-3-030-58621-8_10
[2] TarViS: A Unified Approach for Target-based Video Segmentation
Athar, Ali
Hermans, Alexander
Luiten, Jonathon
Ramanan, Deva
Leibe, Bastian
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18738 - 18748
[3] XMem plus plus : Production-level Video Segmentation From Few Annotated Frames
Bekuzarov, Maksym
Bermudez, Ariana
Lee, Joon-Young
Li, Hao
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 635 - 644
[4] Bertasius G, 2021, PR MACH LEARN RES, V139
[5] Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation
Bertasius, Gedas
Torresani, Lorenzo
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 9736 - 9745
[6] Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003
[7] One-Shot Video Object Segmentation
Caelles, S.
Maninis, K. -K.
Pont-Tuset, J.
Leal-Taixe, L.
Cremers, D.
Van Gool, L.
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5320 - 5329
[8] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[9] Chen Zhe, 2022, Vision transformer adapter for dense predictions, DOI arXiv:2205.08534
[10] Cheng B., 2021, arXiv

← 1 2 3 4 5 6 7 8 →