Exploring the Better Correlation for Few-Shot Video Object Segmentation

被引：0

作者：

Luo, Naisong ^{[1
]}

Wang, Yuan ^{[1
]}

Sun, Rui ^{[1
]}

Xiong, Guoxin ^{[1
]}

Zhang, Tianzhu ^{[1
,2
]}

Wu, Feng ^{[1
,2
]}

机构：

[1] Univ Sci & Technol China, Sch Informat Sci, Hefei 230027, Peoples R China

[2] Deep Space Explorat Lab, Hefei 230088, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2025年 / 35卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Few-shot video object segmentation; video object segmentation; few-shot learning;

D O I：

10.1109/TCSVT.2024.3491214

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Few-shot video object segmentation (FSVOS) aims to achieve accurate segmentation of novel objects in given video sequences, where the target objects are specified by limited annotated images as support. Most previous top-performing methods adopt the support-query semantic correlation learning paradigm or the intra-query temporal correlation learning paradigm. Nevertheless, they either fail to model temporal consistency across frames, resulting in inconsecutive segmentation, or lose diverse support object information, leading to incomplete segmentation. Therefore, we argue that it is more desirable to achieve both correlations in a collaborative manner. In this work, we delve into the issues present in the combination of few-shot image segmentation methods and video object segmentation methods and propose a dedicated Collaborative Correlation Network (CoCoNet) to address these problems, including a pixel correlation calibration module and a temporal correlation mining module. The proposed CoCoNet enjoys several merits. First, the pixel correlation calibration module aims to mitigate the noise issue in support-query correlation by integrating the affinity learning strategy and the prototype learning strategy. Specifically, we employ Optimal Transport to enrich pixel correlation with contextual information, thereby reducing intra-class differences between support and query. Second, the temporal correlation mining module is responsible for alleviating the issue of uncertainty in the initial frame and establishing reliable guidance for subsequent frames of the query video. With the collaboration of these two modules, our CoCoNet can effectively establish support-query and temporal correlation simultaneously and achieve accurate FSVOS. Extensive experimental results on two challenging benchmarks demonstrate that our method performs favorably against state-of-the-art FSVOS methods.

引用

页码：2133 / 2146

页数：14

共 66 条

[21] Unsupervised Video Object Segmentation with Motion-Based Bilateral Networks
Li, Siyang
Seybold, Bryan
Vorobyov, Alexey
Lei, Xuejing
Kuo, C-C Jay
[J]. COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 : 215 - 231
[22] HFVOS: History-Future Integrated Dynamic Memory for Video Object Segmentation
Li, Wanyun
Fan, Jack
Guo, Pinxue
Hong, Lingyi
Zhang, Wei
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 10208 - 10222
[23] Liang Y., 2020, Advances in neural information processing systems, V33, P3430
[24] Focal Loss for Dense Object Detection
Lin, Tsung-Yi
Goyal, Priya
Girshick, Ross
He, Kaiming
Dollar, Piotr
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2999 - 3007
[25] Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation
Liu, Nian
Nan, Kepan
Zhao, Wangbo
Liu, Yuanwei
Yao, Xiwen
Khan, Salman
Cholakkal, Hisham
Anwer, Rao Muhammad
Han, Junwei
Khan, Fahad Shahbaz
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18816 - 18825
[26] Liu Y., 2024, P 41 INT C MACH LEAR, P1
[27] Liu Y, 2022, P ADV NEUR INF PROC, V35, P38020
[28] Liu Y., 2020, P EUR C COMP VIS, V12354, P142, DOI 10.1007/978-3-030-58545-7-9
[29] Learning Non-target Knowledge for Few-shot Semantic Segmentation
Liu, Yuanwei
Liu, Nian
Cao, Qinglong
Yao, Xiwen
Han, Junwei
Shao, Ling
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11563 - 11572
[30] Lu X., 2020, ECCV, V2348, P661, DOI 10.1007/978-3-030- 58580-8 39

← 1 2 3 4 5 6 7 →