SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

被引:28
作者
Lin, Zhihui [1 ,2 ]
Yang, Tianyu [2 ]
Li, Maomao [2 ]
Wang, Ziyu [3 ]
Yuan, Chun [4 ]
Jiang, Wenhao [3 ]
Liu, Wei [3 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] Tencent AI Lab, Shenzhen, Peoples R China
[3] Tencent Data Platform, Shenzhen, Peoples R China
[4] Tsinghua Shenzhen Int Grad Sch, Peng Cheng Lab, Shenzhen, Peoples R China
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年
关键词
D O I
10.1109/CVPR52688.2022.00142
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Matching-based methods, especially those based on space-time memory, are significantly ahead of other solutions in semi-supervised video object segmentation (VOS). However, continuously growing and redundant template features lead to an inefficient inference. To alleviate this, we propose a novel Sequential Weighted Expectation-Maximization (SWEM) network to greatly reduce the redundancy of memory features. Different from the previous methods which only detect feature redundancy between frames, SWEM merges both intra-frame and inter-frame similar features by leveraging the sequential weighted EM algorithm. Further, adaptive weights for frame features endow SWEM with the flexibility to represent hard samples, improving the discrimination of templates. Besides, the proposed method maintains a fixed number of template features in memory, which ensures the stable inference complexity of the VOS system. Extensive experiments on commonly used DAVIS and YouTube-VOS datasets verify the high efficiency (36 FPS) and high performance (84.3% JSzT on DAVIS 2017 validation dataset) of SWEM.
引用
收藏
页码:1352 / 1362
页数:11
相关论文
共 52 条
[1]  
Ackerman Margareta, 2012, P AAAI C ARTIFICIAL, V26
[2]   CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF [J].
Bao, Linchao ;
Wu, Baoyuan ;
Liu, Wei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5977-5986
[3]  
Bhat Goutam, 2020, EUR C COMP VIS ECCV
[4]   One-Shot Video Object Segmentation [J].
Caelles, S. ;
Maninis, K. -K. ;
Pont-Tuset, J. ;
Leal-Taixe, L. ;
Cremers, D. ;
Van Gool, L. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329
[5]  
Chen Xi, 2020, IEEE CVF C COMP VIS
[6]   Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning [J].
Chen, Yuhua ;
Pont-Tuset, Jordi ;
Montes, Alberto ;
Van Gool, Luc .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1189-1198
[7]  
Cheng Ho Kei, 2021, ADV NEUR IN, V34
[8]   Fast and Accurate Online Video Object Segmentation via Tracking Parts [J].
Cheng, Jingchun ;
Tsai, Yi-Hsuan ;
Hung, Wei-Chih ;
Wang, Shengjin ;
Yang, Ming-Hsuan .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7415-7424
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]   SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation [J].
Duke, Brendan ;
Ahmed, Abdalla ;
Wolf, Christian ;
Aarabi, Parham ;
Taylor, Graham W. .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5908-5917