RANet: Ranking Attention Network for Fast Video Object Segmentation

被引：160

作者：

Wang, Ziqin ^{[1
,3
]}

Xu, Jun ^{[2
,4
]}

Liu, Li ^{[2
]}

Zhu, Fan ^{[2
]}

Shao, Ling ^{[2
]}

机构：

[1] Univ Sydney, Sydney, NSW, Australia

[2] Incept Inst Artificial Intelligence IIAI, Abu Dhabi, U Arab Emirates

[3] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China

[4] Nankai Univ, Coll Comp Sci, Media Comp Lab, Tianjin, Peoples R China

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

关键词：

D O I：

10.1109/ICCV.2019.00408

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite online learning (OL) techniques have boosted the performance of semi-supervised video object segmentation (VOS) methods, the huge time costs of OL greatly restricts their practicality. Matching based and propagation based methods run at a faster speed by avoiding OL techniques. However, they are limited by sub-optimal accuracy, due to mismatching and drifting problems. In this paper, we develop a real-time yet very accurate Ranking Attention Network (RANet) for VOS. Specifically, to integrate the insights of matching based and propagation based methods, we employ an encoder-decoder framework to learn pixel-level similarity and segmentation in an end-to-end manner. To better utilize the similarity maps, we propose a novel ranking attention module, which automatically ranks and selects these maps for fine-grained VOS performance. Experiments on DAVIS(16) and DAVIS(17) datasets show that our RANet achieves the best speed-accuracy trade-off, e.g., with 33 milliseconds per frame and J&F=85:5% on DAVIS(16). With OL, our RANet reaches J&F=87:1% on DAVIS(16), exceeding state-of-the-art VOS methods. The code can be found at https://github.com/Storife/RANet.

引用

页码：3977 / 3986

页数：10

共 61 条

[1]

[Anonymous], 2016, CVPR

[2]

[Anonymous], 2017, arXiv preprint arXiv:1706.05587, DOI DOI 10.48550/ARXIV.1706.05587

[3]

[Anonymous], 2017, CVPR, DOI DOI 10.1109/CVPR.2017.790

[4]

[Anonymous], 2018 DAVIS CHALL VID

[5]

[Anonymous], 2018, 1 LARG SCAL VID OBJ

[6] CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF [J].

Bao, Linchao ;

Wu, Baoyuan ;

Liu, Wei .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5977-5986

[7] Fully-Convolutional Siamese Networks for Object Tracking [J].

Bertinetto, Luca ;

Valmadre, Jack ;

Henriques, Joao F. ;

Vedaldi, Andrea ;

Torr, Philip H. S. .

COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865

[8] One-Shot Video Object Segmentation [J].

Caelles, S. ;

Maninis, K. -K. ;

Pont-Tuset, J. ;

Leal-Taixe, L. ;

Cremers, D. ;

Van Gool, L. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329

[9]

Caelles Sergi, 2018, ARXIV180300557

[10] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

← 1 2 3 4 5 6 7 →