XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

被引：202

作者：

Cheng, Ho Kei ^{[1
]}

Schwing, Alexander G. ^{[1
]}

机构：

[1] Univ Illinois, Champaign, IL 61820 USA

来源：

COMPUTER VISION - ECCV 2022, PT XXVIII | 2022年 / 13688卷

关键词：

D O I：

10.1007/978-3-031-19815-1_37

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video object segmentation typically only uses one type of feature memory. For videos longer than a minute, a single feature memory model tightly links memory consumption and accuracy. In contrast, following the Atkinson-Shiffrin model, we develop an architecture that incorporates multiple independent yet deeply-connected feature memory stores: a rapidly updated sensory memory, a high-resolution working memory, and a compact thus sustained long-term memory. Crucially, we develop a memory potentiation algorithm that routinely consolidates actively used working memory elements into the long-term memory, which avoids memory explosion and minimizes performance decay for long-term prediction. Combined with a new memory reading mechanism, XMem greatly exceeds state-of-the-art performance on long-video datasets while being on par with state-of-the-art methods (that do not work on long videos) on short-video datasets.

引用

页码：640 / 658

页数：19

共 64 条

[1]

[Anonymous], 1968, PSYCHOL LEARNING MOT, DOI DOI 10.1016/S0079-7421(08)60422-3

[2] Learning What to Learn for Video Object Segmentation [J].

Bhat, Goutam ;

Lawin, Felix Jaremo ;

Danelljan, Martin ;

Robinson, Andreas ;

Felsberg, Michael ;

Van Gool, Luc ;

Timofte, Radu .

COMPUTER VISION - ECCV 2020, PT II, 2020, 12347 :777-794

[3] One-Shot Video Object Segmentation [J].

Caelles, S. ;

Maninis, K. -K. ;

Pont-Tuset, J. ;

Leal-Taixe, L. ;

Cremers, D. ;

Van Gool, L. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329

[4] State-Aware Tracker for Real-Time Video Object Segmentation [J].

Chen, Xi ;

Li, Zuoxin ;

Yuan, Ye ;

Yu, Gang ;

Shen, Jianxin ;

Qi, Donglian .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9381-9390

[5] Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning [J].

Chen, Yuhua ;

Pont-Tuset, Jordi ;

Montes, Alberto ;

Van Gool, Luc .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1189-1198

[6] CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement [J].

Cheng, Ho Kei ;

Chung, Jihoon ;

Tai, Yu-Wing ;

Tang, Chi-Keung .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8887-8896

[7]

Cheng Ho Kei, 2021, ADV NEUR IN, V34

[8]

Cheng Ho Kei, 2021, CVPR

[9] Fast and Accurate Online Video Object Segmentation via Tracking Parts [J].

Cheng, Jingchun ;

Tsai, Yi-Hsuan ;

Hung, Wei-Chih ;

Wang, Shengjin ;

Yang, Ming-Hsuan .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7415-7424

[10]

Cho K., 2014, PROC 8 WORKSHOP SYNT, P103, DOI DOI 10.3115/V1/W14-4012

← 1 2 3 4 5 6 7 →