XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

被引:202
作者
Cheng, Ho Kei [1 ]
Schwing, Alexander G. [1 ]
机构
[1] Univ Illinois, Champaign, IL 61820 USA
来源
COMPUTER VISION - ECCV 2022, PT XXVIII | 2022年 / 13688卷
关键词
D O I
10.1007/978-3-031-19815-1_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video object segmentation typically only uses one type of feature memory. For videos longer than a minute, a single feature memory model tightly links memory consumption and accuracy. In contrast, following the Atkinson-Shiffrin model, we develop an architecture that incorporates multiple independent yet deeply-connected feature memory stores: a rapidly updated sensory memory, a high-resolution working memory, and a compact thus sustained long-term memory. Crucially, we develop a memory potentiation algorithm that routinely consolidates actively used working memory elements into the long-term memory, which avoids memory explosion and minimizes performance decay for long-term prediction. Combined with a new memory reading mechanism, XMem greatly exceeds state-of-the-art performance on long-video datasets while being on par with state-of-the-art methods (that do not work on long videos) on short-video datasets.
引用
收藏
页码:640 / 658
页数:19
相关论文
共 64 条
[1]  
[Anonymous], 1968, PSYCHOL LEARNING MOT, DOI DOI 10.1016/S0079-7421(08)60422-3
[2]   Learning What to Learn for Video Object Segmentation [J].
Bhat, Goutam ;
Lawin, Felix Jaremo ;
Danelljan, Martin ;
Robinson, Andreas ;
Felsberg, Michael ;
Van Gool, Luc ;
Timofte, Radu .
COMPUTER VISION - ECCV 2020, PT II, 2020, 12347 :777-794
[3]   One-Shot Video Object Segmentation [J].
Caelles, S. ;
Maninis, K. -K. ;
Pont-Tuset, J. ;
Leal-Taixe, L. ;
Cremers, D. ;
Van Gool, L. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329
[4]   State-Aware Tracker for Real-Time Video Object Segmentation [J].
Chen, Xi ;
Li, Zuoxin ;
Yuan, Ye ;
Yu, Gang ;
Shen, Jianxin ;
Qi, Donglian .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9381-9390
[5]   Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning [J].
Chen, Yuhua ;
Pont-Tuset, Jordi ;
Montes, Alberto ;
Van Gool, Luc .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1189-1198
[6]   CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement [J].
Cheng, Ho Kei ;
Chung, Jihoon ;
Tai, Yu-Wing ;
Tang, Chi-Keung .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8887-8896
[7]  
Cheng Ho Kei, 2021, ADV NEUR IN, V34
[8]  
Cheng Ho Kei, 2021, CVPR
[9]   Fast and Accurate Online Video Object Segmentation via Tracking Parts [J].
Cheng, Jingchun ;
Tsai, Yi-Hsuan ;
Hung, Wei-Chih ;
Wang, Shengjin ;
Yang, Ming-Hsuan .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7415-7424
[10]  
Cho K., 2014, PROC 8 WORKSHOP SYNT, P103, DOI DOI 10.3115/V1/W14-4012