Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation

被引：2

作者：

Dang, Jisheng ^{[1
,2
]}

Zheng, Huicheng ^{[1
,2
]}

Xu, Xiaohao ^{[3
]}

Wang, Longguang ^{[4
]}

Hu, Qingyong ^{[5
]}

Guo, Yulan ^{[6
]}

机构：

[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Key Lab Machine Intelligence & Adv Comp, Minist Educ, Guangzhou 510006, Peoples R China

[2] Sun Yat Sen Univ, Guangdong Key Lab Informat Secur Technol, Guangzhou 510006, Peoples R China

[3] Univ Michigan, Inst Robot, Ann Arbor, MI 48109 USA

[4] Natl Univ Def Technol, Coll Elect Sci & Technol, Changsha 410000, Peoples R China

[5] Univ Oxford, Dept Comp Sci, Oxford OX1 2JD, England

[6] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen 518000, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2025年 / 36卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Adaptive sparse memory network (ASM); attentive local memory reader (ALMR); video object segmentation (VOS);

D O I：

10.1109/TNNLS.2024.3357118

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, memory-based networks have achieved promising performance for video object segmentation (VOS). However, existing methods still suffer from unsatisfactory segmentation accuracy and inferior efficiency. The reasons are mainly twofold: 1) during memory construction, the inflexible memory storage mechanism results in a weak discriminative ability for similar appearances in complex scenarios, leading to video-level temporal redundancy, and 2) during memory reading, matching robustness and memory retrieval accuracy decrease as the number of video frames increases. To address these challenges, we propose an adaptive sparse memory network (ASM) that efficiently and effectively performs VOS by sparsely leveraging previous guidance while attending to key information. Specifically, we design an adaptive sparse memory constructor (ASMC) to adaptively memorize informative past frames according to dynamic temporal changes in video frames. Furthermore, we introduce an attentive local memory reader (ALMR) to quickly retrieve relevant information using a subset of memory, thereby reducing frame-level redundant computation and noise in a simpler and more convenient manner. To prevent key features from being discarded by the subset of memory, we further propose a novel attentive local feature aggregation (ALFA) module, which preserves useful cues by selectively aggregating discriminative spatial dependence from adjacent frames, thereby effectively increasing the receptive field of each memory frame. Extensive experiments demonstrate that our model achieves state-of-the-art performance with real-time speed on six popular VOS benchmarks. Furthermore, our ASM can be applied to existing memory-based methods as generic plugins to achieve significant performance improvements. More importantly, our method exhibits robustness in handling sparse videos with low frame rates.

引用

页码：3820 / 3833

页数：14

共 75 条

[1] YolTrack: Multitask Learning Based Real-Time Multiobject Tracking and Segmentation for Autonomous Vehicles
Chang, Xuepeng
Pan, Huihui
Sun, Weichao
Gao, Huijun
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (12) : 5323 - 5333
[2] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
Cheng, Ho Kei
Tai, Yu-Wing
Tang, Chi-Keung
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5555 - 5564
[3] Cheng HK, 2020, PROC CVPR IEEE, P8887, DOI 10.1109/CVPR42600.2020.00891
[4] Cheng Ho Kei, 2021, ADV NEUR IN, V34
[5] Tackling Background Distraction in Video Object Segmentation
Cho, Suhwan
Lee, Heansung
Lee, Minhyeok
Park, Chaewon
Jang, Sungjun
Kim, Minjung
Lee, Sangyoun
[J]. COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 446 - 462
[6] Neural network approach to background Modeling for video object segmentation
Culibrk, Dubravko
Marques, Oge
Socek, Daniel
Kalva, Hari
Furht, Borko
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (06): : 1614 - 1627
[7] An efficient fully unsupervised video object segmentation scheme using an adaptive neural-network classifier architecture
Doulamis, A
Doulamis, N
Ntalianis, K
Kollias, S
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2003, 14 (03): : 616 - 630
[8] Scene Segmentation With Dual Relation-Aware Attention Network
Fu, Jun
Liu, Jing
Jiang, Jie
Li, Yong
Bao, Yongjun
Lu, Hanqing
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (06) : 2547 - 2560
[9] Gao J. Han, 2023, Pattern Recognit., V134
[10] Temporal-adaptive sparse feature aggregation for video object detection
He, Fei
Li, Qiaozhe
Zhao, Xin
Huang, Kaiqi
[J]. PATTERN RECOGNITION, 2022, 127

← 1 2 3 4 5 6 7 8 →