Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation

被引:2
作者
Dang, Jisheng [1 ,2 ]
Zheng, Huicheng [1 ,2 ]
Xu, Xiaohao [3 ]
Wang, Longguang [4 ]
Hu, Qingyong [5 ]
Guo, Yulan [6 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Key Lab Machine Intelligence & Adv Comp, Minist Educ, Guangzhou 510006, Peoples R China
[2] Sun Yat Sen Univ, Guangdong Key Lab Informat Secur Technol, Guangzhou 510006, Peoples R China
[3] Univ Michigan, Inst Robot, Ann Arbor, MI 48109 USA
[4] Natl Univ Def Technol, Coll Elect Sci & Technol, Changsha 410000, Peoples R China
[5] Univ Oxford, Dept Comp Sci, Oxford OX1 2JD, England
[6] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen 518000, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive sparse memory network (ASM); attentive local memory reader (ALMR); video object segmentation (VOS);
D O I
10.1109/TNNLS.2024.3357118
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, memory-based networks have achieved promising performance for video object segmentation (VOS). However, existing methods still suffer from unsatisfactory segmentation accuracy and inferior efficiency. The reasons are mainly twofold: 1) during memory construction, the inflexible memory storage mechanism results in a weak discriminative ability for similar appearances in complex scenarios, leading to video-level temporal redundancy, and 2) during memory reading, matching robustness and memory retrieval accuracy decrease as the number of video frames increases. To address these challenges, we propose an adaptive sparse memory network (ASM) that efficiently and effectively performs VOS by sparsely leveraging previous guidance while attending to key information. Specifically, we design an adaptive sparse memory constructor (ASMC) to adaptively memorize informative past frames according to dynamic temporal changes in video frames. Furthermore, we introduce an attentive local memory reader (ALMR) to quickly retrieve relevant information using a subset of memory, thereby reducing frame-level redundant computation and noise in a simpler and more convenient manner. To prevent key features from being discarded by the subset of memory, we further propose a novel attentive local feature aggregation (ALFA) module, which preserves useful cues by selectively aggregating discriminative spatial dependence from adjacent frames, thereby effectively increasing the receptive field of each memory frame. Extensive experiments demonstrate that our model achieves state-of-the-art performance with real-time speed on six popular VOS benchmarks. Furthermore, our ASM can be applied to existing memory-based methods as generic plugins to achieve significant performance improvements. More importantly, our method exhibits robustness in handling sparse videos with low frame rates.
引用
收藏
页码:3820 / 3833
页数:14
相关论文
共 75 条
  • [51] Delving Deeper Into Mask Utilization in Video Object Segmentation
    Wang, Mengmeng
    Mei, Jianbiao
    Liu, Lina
    Tian, Guanzhong
    Liu, Yong
    Pan, Zaisheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6255 - 6266
  • [52] Scene Classification of High-Resolution Remotely Sensed Image Based on ResNet
    Wang, Mingchang
    Zhang, Xinyue
    Niu, Xuefeng
    Wang, Fengyan
    Zhang, Xuqing
    [J]. JOURNAL OF GEOVISUALIZATION AND SPATIAL ANALYSIS, 2019, 3 (02)
  • [53] Non-local Neural Networks
    Wang, Xiaolong
    Girshick, Ross
    Gupta, Abhinav
    He, Kaiming
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7794 - 7803
  • [54] Dynamic Graph CNN for Learning on Point Clouds
    Wang, Yue
    Sun, Yongbin
    Liu, Ziwei
    Sarma, Sanjay E.
    Bronstein, Michael M.
    Solomon, Justin M.
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (05):
  • [55] CBAM: Convolutional Block Attention Module
    Woo, Sanghyun
    Park, Jongchan
    Lee, Joon-Young
    Kweon, In So
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 3 - 19
  • [56] Wu X., 2022, P IEEE CVF C COMP VI, P4996
  • [57] Xiankai Lu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12348), P661, DOI 10.1007/978-3-030-58580-8_39
  • [58] Accelerating Video Object Segmentation with Compressed Video
    Xu, Kai
    Yao, Angela
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1332 - 1341
  • [59] YouTube-VOS: Sequence-to-Sequence Video Object Segmentation
    Xu, Ning
    Yang, Linjie
    Fan, Yuchen
    Yang, Jianchao
    Yue, Dingcheng
    Liang, Yuchen
    Price, Brian
    Cohen, Scott
    Huang, Thomas
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 603 - 619
  • [60] Towards Robust Video Object Segmentation with Adaptive Object Calibration
    Xu, Xiaohao
    Wang, Jinglu
    Ming, Xiang
    Lu, Yan
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2709 - 2718