RANet: Ranking Attention Network for Fast Video Object Segmentation

被引:160
作者
Wang, Ziqin [1 ,3 ]
Xu, Jun [2 ,4 ]
Liu, Li [2 ]
Zhu, Fan [2 ]
Shao, Ling [2 ]
机构
[1] Univ Sydney, Sydney, NSW, Australia
[2] Incept Inst Artificial Intelligence IIAI, Abu Dhabi, U Arab Emirates
[3] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
[4] Nankai Univ, Coll Comp Sci, Media Comp Lab, Tianjin, Peoples R China
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
关键词
D O I
10.1109/ICCV.2019.00408
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite online learning (OL) techniques have boosted the performance of semi-supervised video object segmentation (VOS) methods, the huge time costs of OL greatly restricts their practicality. Matching based and propagation based methods run at a faster speed by avoiding OL techniques. However, they are limited by sub-optimal accuracy, due to mismatching and drifting problems. In this paper, we develop a real-time yet very accurate Ranking Attention Network (RANet) for VOS. Specifically, to integrate the insights of matching based and propagation based methods, we employ an encoder-decoder framework to learn pixel-level similarity and segmentation in an end-to-end manner. To better utilize the similarity maps, we propose a novel ranking attention module, which automatically ranks and selects these maps for fine-grained VOS performance. Experiments on DAVIS(16) and DAVIS(17) datasets show that our RANet achieves the best speed-accuracy trade-off, e.g., with 33 milliseconds per frame and J&F=85:5% on DAVIS(16). With OL, our RANet reaches J&F=87:1% on DAVIS(16), exceeding state-of-the-art VOS methods. The code can be found at https://github.com/Storife/RANet.
引用
收藏
页码:3977 / 3986
页数:10
相关论文
共 61 条
[1]  
[Anonymous], 2016, CVPR
[2]  
[Anonymous], 2017, arXiv preprint arXiv:1706.05587, DOI DOI 10.48550/ARXIV.1706.05587
[3]  
[Anonymous], 2017, CVPR, DOI DOI 10.1109/CVPR.2017.790
[4]  
[Anonymous], 2018 DAVIS CHALL VID
[5]  
[Anonymous], 2018, 1 LARG SCAL VID OBJ
[6]   CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF [J].
Bao, Linchao ;
Wu, Baoyuan ;
Liu, Wei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5977-5986
[7]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[8]   One-Shot Video Object Segmentation [J].
Caelles, S. ;
Maninis, K. -K. ;
Pont-Tuset, J. ;
Leal-Taixe, L. ;
Cremers, D. ;
Van Gool, L. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329
[9]  
Caelles Sergi, 2018, ARXIV180300557
[10]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848