XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

被引:119
作者
Cheng, Ho Kei [1 ]
Schwing, Alexander G. [1 ]
机构
[1] Univ Illinois, Champaign, IL 61820 USA
来源
COMPUTER VISION - ECCV 2022, PT XXVIII | 2022年 / 13688卷
关键词
D O I
10.1007/978-3-031-19815-1_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video object segmentation typically only uses one type of feature memory. For videos longer than a minute, a single feature memory model tightly links memory consumption and accuracy. In contrast, following the Atkinson-Shiffrin model, we develop an architecture that incorporates multiple independent yet deeply-connected feature memory stores: a rapidly updated sensory memory, a high-resolution working memory, and a compact thus sustained long-term memory. Crucially, we develop a memory potentiation algorithm that routinely consolidates actively used working memory elements into the long-term memory, which avoids memory explosion and minimizes performance decay for long-term prediction. Combined with a new memory reading mechanism, XMem greatly exceeds state-of-the-art performance on long-video datasets while being on par with state-of-the-art methods (that do not work on long videos) on short-video datasets.
引用
收藏
页码:640 / 658
页数:19
相关论文
共 64 条
  • [1] Atkinson R. C., 1968, PSYCHOL LEARN MOTIV, V2, P89, DOI [10.1016/S0079-7421(08)60422-3, DOI 10.1016/S0079-7421(08)60422-3]
  • [2] Bhat Goutam, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12347), P777, DOI 10.1007/978-3-030-58536-5_46
  • [3] One-Shot Video Object Segmentation
    Caelles, S.
    Maninis, K. -K.
    Pont-Tuset, J.
    Leal-Taixe, L.
    Cremers, D.
    Van Gool, L.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5320 - 5329
  • [4] Chen X, 2020, PROC CVPR IEEE, P9381, DOI 10.1109/CVPR42600.2020.00940
  • [5] Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning
    Chen, Yuhua
    Pont-Tuset, Jordi
    Montes, Alberto
    Van Gool, Luc
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1189 - 1198
  • [6] Cheng HK, 2021, ADV NEUR IN, V34
  • [7] Cheng HK, 2020, PROC CVPR IEEE, P8887, DOI 10.1109/CVPR42600.2020.00891
  • [8] Cheng Ho Kei, 2021, CVPR
  • [9] Fast and Accurate Online Video Object Segmentation via Tracking Parts
    Cheng, Jingchun
    Tsai, Yi-Hsuan
    Hung, Wei-Chih
    Wang, Shengjin
    Yang, Ming-Hsuan
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7415 - 7424
  • [10] Cho K., 2014, P SSST 8 8 WORKSHOP, DOI 10.3115/v1/w14-4012