Holistic Prototype Attention Network for Few-Shot Video Object Segmentation

被引：8

作者：

Tang, Yin ^{[1
]}

Chen, Tao ^{[1
]}

Jiang, Xiruo ^{[1
]}

Yao, Yazhou ^{[1
]}

Xie, Guo-Sen ^{[1
]}

Shen, Heng-Tao ^{[2
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Prototypes; Task analysis; Object segmentation; Semantic segmentation; Semantics; Feature extraction; Annotations; Few-shot video object segmentation; video object segmentation; few-shot semantic segmentation;

D O I：

10.1109/TCSVT.2023.3296629

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Few-shot video object segmentation (FSVOS) aims to segment dynamic objects of unseen classes by resorting to a small set of support images that contain pixel-level object annotations. Existing methods have demonstrated that the domain agent-based attention mechanism is effective in FSVOS by learning the correlation between support images and query frames. However, the agent frame contains redundant pixel information and background noise, resulting in inferior segmentation performance. Moreover, existing methods tend to ignore inter-frame correlations in query videos. To alleviate the above dilemma, we propose a holistic prototype attention network (HPAN) for advancing FSVOS. Specifically, HPAN introduces a prototype graph attention module (PGAM) and a bidirectional prototype attention module (BPAM), transferring informative knowledge from seen to unseen classes. PGAM generates local prototypes from all foreground features and then utilizes their internal correlations to enhance the representation of the holistic prototypes. BPAM exploits the holistic information from support images and video frames by fusing co-attention and self-attention to achieve support-query semantic consistency and inner-frame temporal consistency. Extensive experiments on YouTube-FSVOS have been provided to demonstrate the effectiveness and superiority of our proposed HPAN method. Our source code and models are available anonymously at https://github.com/NUST-Machine-Intelligence-Laboratory/HPAN.

引用

页码：6699 / 6709

页数：11

共 72 条

[1] Bansal S., 2017, One-shot learning for semantic segmentation
[2] Few-Shot Segmentation Without Meta-Learning: A Good Transductive Inference Is All You Need?
Boudiaf, Malik
Kervadec, Hoel
Masud, Ziko Imtiaz
Piantanida, Pablo
Ben Ayed, Ismail
Dolz, Jose
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13974 - 13983
[3] Boyu Yang, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12353), P763, DOI 10.1007/978-3-030-58598-3_45
[4] One-Shot Video Object Segmentation
Caelles, S.
Maninis, K. -K.
Pont-Tuset, J.
Leal-Taixe, L.
Cremers, D.
Van Gool, L.
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5320 - 5329
[5] Delving Deep into Many-to-many Attention for Few-shot Video Object Segmentation
Chen, Haoxin
Wu, Hanjie
Zhao, Nanxuan
Ren, Sucheng
He, Shengfeng
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14035 - 14044
[6] Multi-Granularity Denoising and Bidirectional Alignment for Weakly Supervised Semantic Segmentation
Chen, Tao
Yao, Yazhou
Tang, Jinhui
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2960 - 2971
[7] Cheng Ho Kei, 2021, ADV NEUR IN, V34
[8] Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches
Cheng, Zhiyuan
Liang, James
Choi, Hongjun
Tao, Guanhong
Cao, Zhiwen
Liu, Dongfang
Zhang, Xiangyu
[J]. COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 514 - 532
[9] TF-Blender: Temporal Feature Blender for Video Object Detection
Cui, Yiming
Yan, Liqi
Cao, Zhiwen
Liu, Dongfang
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8118 - 8127
[10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

← 1 2 3 4 5 6 7 8 →