Recurrent Dynamic Embedding for Video Object Segmentation

被引：48

作者：

Li, Mingxing ^{[1
,3
]}

Hu, Li ^{[2
]}

Xiong, Zhiwei ^{[1
]}

Zhang, Bang ^{[2
]}

Pan, Pan ^{[2
]}

Liu, Dong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

[2] Alibaba Grp, Alibaba DAMO Acad, Hangzhou, Peoples R China

[3] Alibaba, Hangzhou, Peoples R China

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

D O I：

10.1109/CVPR52688.2022.00139

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Space-time memory (STM) based video object segmentation (VOS) networks usually keep increasing memory bank every several frames, which shows excellent performance. However; 1) the hardware cannot withstand the ever-increasing memory requirements as the video length increases. 2) Storing lots of information inevitably introduces lots of noise, which is not conducive to reading the most important information from the memory bank In this paper, we propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. Specifically, we explicitly generate and update RDE by the proposed Spatio-temporal Aggregation Module (SAM), which exploits the cue of historical information. To avoid error accumulation owing to the recurrent usage of SAM, we propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. Moreover, the predicted masks in the memory bank are inaccurate due to the inaccurate network inference, which affects the segmentation of the query frame. To address this problem, we design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank Extensive experiments show our method achieves the best tradeoff between performance and speed.

引用

页码：1322 / 1331

页数：10

共 44 条

[1]

[Anonymous], 2020, EUR C COMP VIS, DOI DOI 10.1109/IVEC45766.2020.9520580

[2]

[Anonymous], 2020, P IEEE CVF C COMP VI, DOI DOI 10.1109/CYBER50695.2020.9279193

[3]

Chang Angel X., 2015, arXiv

[4] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[5] Multiband Parameter Estimation for Spectrum Sensing from Noisy Measurements [J].

Cheng, Hanke ;

Bruno, Joseph M. ;

Mark, Brian L. ;

Ephraim, Yariv ;

Chen, Chun-Hung .

ICC 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2020,

[6] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion [J].

Cheng, Ho Kei ;

Tai, Yu-Wing ;

Tang, Chi-Keung .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5555-5564

[7]

Cheng Ho Kei, 2021, ARXIV210605210

[8]

Denninger Maximilian, 2019, ROBOTICS SCI SYSTEMS

[9] SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation [J].

Duke, Brendan ;

Ahmed, Abdalla ;

Wolf, Christian ;

Aarabi, Parham ;

Taylor, Graham W. .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5908-5917

[10] SlowFast Networks for Video Recognition [J].

Feichtenhofer, Christoph ;

Fan, Haoqi ;

Malik, Jitendra ;

He, Kaiming .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6201-6210

← 1 2 3 4 5 →