Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation

被引：2

作者：

Dang, Jisheng ^{[1
,2
]}

Zheng, Huicheng ^{[1
,2
]}

Xu, Xiaohao ^{[3
]}

Wang, Longguang ^{[4
]}

Hu, Qingyong ^{[5
]}

Guo, Yulan ^{[6
]}

机构：

[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Key Lab Machine Intelligence & Adv Comp, Minist Educ, Guangzhou 510006, Peoples R China

[2] Sun Yat Sen Univ, Guangdong Key Lab Informat Secur Technol, Guangzhou 510006, Peoples R China

[3] Univ Michigan, Inst Robot, Ann Arbor, MI 48109 USA

[4] Natl Univ Def Technol, Coll Elect Sci & Technol, Changsha 410000, Peoples R China

[5] Univ Oxford, Dept Comp Sci, Oxford OX1 2JD, England

[6] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen 518000, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2025年 / 36卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Adaptive sparse memory network (ASM); attentive local memory reader (ALMR); video object segmentation (VOS);

D O I：

10.1109/TNNLS.2024.3357118

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, memory-based networks have achieved promising performance for video object segmentation (VOS). However, existing methods still suffer from unsatisfactory segmentation accuracy and inferior efficiency. The reasons are mainly twofold: 1) during memory construction, the inflexible memory storage mechanism results in a weak discriminative ability for similar appearances in complex scenarios, leading to video-level temporal redundancy, and 2) during memory reading, matching robustness and memory retrieval accuracy decrease as the number of video frames increases. To address these challenges, we propose an adaptive sparse memory network (ASM) that efficiently and effectively performs VOS by sparsely leveraging previous guidance while attending to key information. Specifically, we design an adaptive sparse memory constructor (ASMC) to adaptively memorize informative past frames according to dynamic temporal changes in video frames. Furthermore, we introduce an attentive local memory reader (ALMR) to quickly retrieve relevant information using a subset of memory, thereby reducing frame-level redundant computation and noise in a simpler and more convenient manner. To prevent key features from being discarded by the subset of memory, we further propose a novel attentive local feature aggregation (ALFA) module, which preserves useful cues by selectively aggregating discriminative spatial dependence from adjacent frames, thereby effectively increasing the receptive field of each memory frame. Extensive experiments demonstrate that our model achieves state-of-the-art performance with real-time speed on six popular VOS benchmarks. Furthermore, our ASM can be applied to existing memory-based methods as generic plugins to achieve significant performance improvements. More importantly, our method exhibits robustness in handling sparse videos with low frame rates.

引用

页码：3820 / 3833

页数：14

共 75 条

[51] Delving Deeper Into Mask Utilization in Video Object Segmentation
Wang, Mengmeng
Mei, Jianbiao
Liu, Lina
Tian, Guanzhong
Liu, Yong
Pan, Zaisheng
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6255 - 6266
[52] Scene Classification of High-Resolution Remotely Sensed Image Based on ResNet
Wang, Mingchang
Zhang, Xinyue
Niu, Xuefeng
Wang, Fengyan
Zhang, Xuqing
[J]. JOURNAL OF GEOVISUALIZATION AND SPATIAL ANALYSIS, 2019, 3 (02)
[53] Non-local Neural Networks
Wang, Xiaolong
Girshick, Ross
Gupta, Abhinav
He, Kaiming
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7794 - 7803
[54] Dynamic Graph CNN for Learning on Point Clouds
Wang, Yue
Sun, Yongbin
Liu, Ziwei
Sarma, Sanjay E.
Bronstein, Michael M.
Solomon, Justin M.
[J]. ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (05):
[55] CBAM: Convolutional Block Attention Module
Woo, Sanghyun
Park, Jongchan
Lee, Joon-Young
Kweon, In So
[J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 3 - 19
[56] Wu X., 2022, P IEEE CVF C COMP VI, P4996
[57] Xiankai Lu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12348), P661, DOI 10.1007/978-3-030-58580-8_39
[58] Accelerating Video Object Segmentation with Compressed Video
Xu, Kai
Yao, Angela
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1332 - 1341
[59] YouTube-VOS: Sequence-to-Sequence Video Object Segmentation
Xu, Ning
Yang, Linjie
Fan, Yuchen
Yang, Jianchao
Yue, Dingcheng
Liang, Yuchen
Price, Brian
Cohen, Scott
Huang, Thomas
[J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 603 - 619
[60] Towards Robust Video Object Segmentation with Adaptive Object Calibration
Xu, Xiaohao
Wang, Jinglu
Ming, Xiang
Lu, Yan
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2709 - 2718

← 1 2 3 4 5 6 7 8 →