Siamese Dynamic Mask Estimation Network for Fast Video Object Segmentation

被引:1
作者
Hong, Dexiang [1 ]
Li, Guorong [1 ]
Xu, Kai [1 ]
Su, Li [1 ]
Huang, Qingming [1 ]
机构
[1] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing, Peoples R China
来源
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2021年
基金
中国国家自然科学基金;
关键词
TRACKING;
D O I
10.1109/ICPR48806.2021.9412609
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video object segmentation(VOS) has been a fundamental topic in recent years, and many deep learning-based methods have achieved state-of-the-art performance on multiple benchmarks. However, most of these methods rely on pixel-level matching between the template and the searched frames on the whole image while the targets only occupy a small region. Calculating on the entire image brings lots of additional computation cost. Besides, the whole image may contain some distracting information resulting in many false-positive matching points. To address this issue, motivated by one-stage instance object segmentation methods, we propose an efficient siamese dynamic mask estimation network for fast video object segmentation. The VOS is decoupled into two tasks, i.e., mask feature learning and dynamic kernel prediction. The former is responsible for learning high-quality features to preserve structural geometric information, and the latter learns a dynamic kernel that is used to convolve with the mask feature to generate a mask output. We use Siamese neural network as a feature extractor and directly predict masks after correlation. In this way, we can avoid using pixel-level matching, making our framework more simple and efficient. Experiment results on DAVIS 2016/2017 datasets show that our proposed methods can run at 35 frames per second on NVIDIA RTX TITAN while preserving competitive accuracy.
引用
收藏
页码:9476 / 9482
页数:7
相关论文
共 37 条
  • [1] CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF
    Bao, Linchao
    Wu, Baoyuan
    Liu, Wei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5977 - 5986
  • [2] Fully-Convolutional Siamese Networks for Object Tracking
    Bertinetto, Luca
    Valmadre, Jack
    Henriques, Joao F.
    Vedaldi, Andrea
    Torr, Philip H. S.
    [J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865
  • [3] Bolya D., 2019, 2019 IEEECVF INT C C
  • [4] One-Shot Video Object Segmentation
    Caelles, S.
    Maninis, K. -K.
    Pont-Tuset, J.
    Leal-Taixe, L.
    Cremers, D.
    Van Gool, L.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5320 - 5329
  • [5] Chen H., 2020, P IEEE CVF C COMP VI, P8573, DOI DOI 10.1109/CVPR42600.2020.00860
  • [6] Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning
    Chen, Yuhua
    Pont-Tuset, Jordi
    Montes, Alberto
    Van Gool, Luc
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1189 - 1198
  • [7] Fast and Accurate Online Video Object Segmentation via Tracking Parts
    Cheng, Jingchun
    Tsai, Yi-Hsuan
    Hung, Wei-Chih
    Wang, Shengjin
    Yang, Ming-Hsuan
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7415 - 7424
  • [8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [9] He K., 2017, IEEE I CONF COMP VIS, P2961, DOI DOI 10.1109/ICCV.2017.322
  • [10] VideoMatch: Matching Based Video Object Segmentation
    Hu, Yuan-Ting
    Huang, Jia-Bin
    Schwing, Alexander G.
    [J]. COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 56 - 73