Improving Video Segmentation via Dynamic Anchor Queries

被引：0

作者：

Zhou, Yikang ^{[1
]}

Zhang, Tao ^{[1
,2
]}

Ji, Shunping ^{[1
]}

Yan, Shuicheng ^{[2
]}

Li, Xiangtai ^{[2
]}

机构：

[1] Wuhan Univ, Wuhan, Peoples R China

[2] Skywork AI, Singapore, Singapore

来源：

COMPUTER VISION - ECCV 2024, PT L | 2025年 / 15108卷

关键词：

Video segmentation; Dynamic anchor design; Universal segmentation; TRACKING;

D O I：

10.1007/978-3-031-72973-7_26

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Modern video segmentation methods adopt feature transitions between anchor and target queries to perform cross-frame object association. The smooth feature transitions between anchor and target queries enable these methods to achieve satisfactory performance when tracking continuously appearing objects. However, the emergence and disappearance of objects interrupt the smooth feature transition, and even widen this feature transition gap between anchor and target queries, which causes these methods to all underperform on newly emerging and disappearing objects that are common in the real world. We introduce Dynamic Anchor Queries (DAQ) to shorten the transition gap by dynamically generating anchor queries based on the features of potential newly emerging and disappearing candidates. Furthermore, we introduce a query-level object Emergence and Disappearance Simulation (EDS) strategy, which unleashes DAQ's potential without any additional cost. Finally, we combine our proposed DAQ and EDS with the previous method, DVIS, to obtain DVIS-DAQ. Extensive experiments demonstrate that DVIS-DAQ achieves a new state-of-the-art (SOTA) performance on five mainstream video segmentation benchmarks.

引用

页码：446 / 463

页数：18

共 79 条

[11] Cheng B, 2021, ADV NEUR IN, V34
[12] Masked-attention Mask Transformer for Universal Image Segmentation
Cheng, Bowen
Misra, Ishan
Schwing, Alexander G.
Kirillov, Alexander
Girdhar, Rohit
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1280 - 1289
[13] Cheng HK, 2021, ADV NEUR IN, V34
[14] Tracking Anything with Decoupled Video Segmentation
Cheng, Ho Kei
Oh, Seoung Wug
Price, Brian
Schwing, Alexander
Lee, Joon-Young
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1316 - 1326
[15] MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
Ding, Henghui
Liu, Chang
He, Shuting
Jiang, Xudong
Torr, Philip H. S.
Bai, Song
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20167 - 20177
[16] Dosovitskiy A., 2020, PROC ICLR, P1
[17] MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking
Gao, Ruopeng
Wang, Limin
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9867 - 9876
[18] Hannan T, 2023, Arxiv, DOI arXiv:2305.17096
[19] He J, 2024, Arxiv, DOI arXiv:2311.18537
[20] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778

← 1 2 3 4 5 6 7 8 →