Siamese Dynamic Mask Estimation Network for Fast Video Object Segmentation

被引：1

作者：

Hong, Dexiang ^{[1
]}

Li, Guorong ^{[1
]}

Xu, Kai ^{[1
]}

Su, Li ^{[1
]}

Huang, Qingming ^{[1
]}

机构：

[1] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing, Peoples R China

来源：

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2021年

基金：

中国国家自然科学基金;

关键词：

TRACKING;

D O I：

10.1109/ICPR48806.2021.9412609

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video object segmentation(VOS) has been a fundamental topic in recent years, and many deep learning-based methods have achieved state-of-the-art performance on multiple benchmarks. However, most of these methods rely on pixel-level matching between the template and the searched frames on the whole image while the targets only occupy a small region. Calculating on the entire image brings lots of additional computation cost. Besides, the whole image may contain some distracting information resulting in many false-positive matching points. To address this issue, motivated by one-stage instance object segmentation methods, we propose an efficient siamese dynamic mask estimation network for fast video object segmentation. The VOS is decoupled into two tasks, i.e., mask feature learning and dynamic kernel prediction. The former is responsible for learning high-quality features to preserve structural geometric information, and the latter learns a dynamic kernel that is used to convolve with the mask feature to generate a mask output. We use Siamese neural network as a feature extractor and directly predict masks after correlation. In this way, we can avoid using pixel-level matching, making our framework more simple and efficient. Experiment results on DAVIS 2016/2017 datasets show that our proposed methods can run at 35 frames per second on NVIDIA RTX TITAN while preserving competitive accuracy.

引用

页码：9476 / 9482

页数：7

共 37 条

[1] CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF
Bao, Linchao
Wu, Baoyuan
Liu, Wei
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5977 - 5986
[2] Fully-Convolutional Siamese Networks for Object Tracking
Bertinetto, Luca
Valmadre, Jack
Henriques, Joao F.
Vedaldi, Andrea
Torr, Philip H. S.
[J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865
[3] Bolya D., 2019, 2019 IEEECVF INT C C
[4] One-Shot Video Object Segmentation
Caelles, S.
Maninis, K. -K.
Pont-Tuset, J.
Leal-Taixe, L.
Cremers, D.
Van Gool, L.
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5320 - 5329
[5] Chen H., 2020, P IEEE CVF C COMP VI, P8573, DOI DOI 10.1109/CVPR42600.2020.00860
[6] Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning
Chen, Yuhua
Pont-Tuset, Jordi
Montes, Alberto
Van Gool, Luc
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1189 - 1198
[7] Fast and Accurate Online Video Object Segmentation via Tracking Parts
Cheng, Jingchun
Tsai, Yi-Hsuan
Hung, Wei-Chih
Wang, Shengjin
Yang, Ming-Hsuan
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7415 - 7424
[8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9] He K., 2017, IEEE I CONF COMP VIS, P2961, DOI DOI 10.1109/ICCV.2017.322
[10] VideoMatch: Matching Based Video Object Segmentation
Hu, Yuan-Ting
Huang, Jia-Bin
Schwing, Alexander G.
[J]. COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 56 - 73

← 1 2 3 4 →