Dual temporal memory network with high-order spatio-temporal graph learning for video object segmentation

被引：0

作者：

Fan, Jiaqing ^{[1
]}

Hu, Shenglong ^{[2
]}

Wang, Long ^{[3
]}

Zhang, Kaihua ^{[2
]}

Liu, Bo ^{[4
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China

[2] Nanjing Univ Informat Sci & Technol, Sch Comp & Sci, Nanjing, Peoples R China

[3] Guodian Nanjing Automat Co Ltd, Nanjing, Peoples R China

[4] Walmart Global Tech, Sunnyvale, CA USA

来源：

IMAGE AND VISION COMPUTING | 2024年 / 150卷

关键词：

Semi-supervised learning; Video object segmentation; Graph convolution networks; High-order graph learning; Direction-aware attention;

D O I：

10.1016/j.imavis.2024.105208

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Typically, Video Object Segmentation (VOS) always has the semi-supervised setting in the testing phase. The VOS aims to track and segment one or several target objects in the following frames in the sequence, only given the ground-truth segmentation mask in the initial frame. A fundamental issue in VOS is how to best utilize the temporal information to improve the accuracy. To address the aforementioned issue, we provide an end-to-end framework that simultaneously extracts long-term and short-term historical sequential information to current frame as temporal memories. The integrated temporal architecture consists of a short-term and a long-term memory modules. Specifically, the short-term memory module leverages a high-order graph-based learning framework to simulate the fine-grained spatial-temporal interactions between local regions across neighboring frames in a video, thereby maintaining the spatio-temporal visual consistency on local regions. Meanwhile, to relieve the occlusion and drift issues, the long-term memory module employs a Simplified Gated Recurrent Unit (S-GRU) to model the long evolutions in a video. Furthermore, we design a novel direction-aware attention module to complementarily augment the object representation for more robust segmentation. Our experiments on three mainstream VOS benchmarks, containing DAVIS 2017, DAVIS 2016, and Youtube-VOS, demonstrate that our proposed solution provides a fair tradeoff performance between both speed and accuracy.

引用

页数：10

共 54 条

[1] CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF
Bao, Linchao
Wu, Baoyuan
Liu, Wei
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5977 - 5986
[2] One-Shot Video Object Segmentation
Caelles, S.
Maninis, K. -K.
Pont-Tuset, J.
Leal-Taixe, L.
Cremers, D.
Van Gool, L.
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5320 - 5329
[3] Chen X, 2020, PROC CVPR IEEE, P9381, DOI 10.1109/CVPR42600.2020.00940
[4] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
Cheng, Ho Kei
Tai, Yu-Wing
Tang, Chi-Keung
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5555 - 5564
[5] SegFlow: Joint Learning for Video Object Segmentation and Optical Flow
Cheng, Jingchun
Tsai, Yi-Hsuan
Wang, Shengjin
Yang, Ming-Hsuan
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 686 - 695
[6] Cho K., 2014, P 2014 C EMPIRICAL M, P1724, DOI [DOI 10.3115/V1/D14-1179, 10.3115/v1/D14-1179]
[7] Pixel-Level Bijective Matching for Video Object Segmentation
Cho, Suhwan
Lee, Heansung
Kim, Minjung
Jang, Sungjun
Lee, Sangyoun
[J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 1453 - 1462
[8] CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing
Duarte, Kevin
Rawat, Yogesh S.
Shah, Mubarak
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8479 - 8488
[9] SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Duke, Brendan
Ahmed, Abdalla
Wolf, Christian
Aarabi, Parham
Taylor, Graham W.
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5908 - 5917
[10] Video Object Segmentation Using Global and Instance Embedding Learning
Ge, Wenbin
Lu, Xiankai
Shen, Jianbing
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16831 - 16840

← 1 2 3 4 5 6 →