Fast Video Object Segmentation with Temporal Aggregation Network and Dynamic Template Matching

被引：19

作者：

Huang, Xuhua ^{[1
]}

Xu, Jiarui ^{[1
]}

Tai, Yu-Wing ^{[2
]}

Tang, Chi-Keung ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[2] Tencent, Shenzhen, Peoples R China

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年

关键词：

D O I：

10.1109/CVPR42600.2020.00890

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Significant progress has been made in Video Object Segmentation (VOS), the video object tracking task in its finest level. While the VOS task can be naturally de-coupled into image semantic segmentation and video object tracking, significantly much more research effort has been made in segmentation than tracking. In this paper, we introduce "tracking-by-detection" into VOS which can coherently integrates segmentation into tracking, by proposing a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance. Notably, our method is entirely online and thus suitable for one-shot learning, and our end-to-end trainable model allows multiple object segmentation in one forward pass. We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively. Project page is available at https://xuhuaking.github.io/Fast-VOS-DTTM-TAN/.

引用

页码：8876 / 8886

页数：11

共 64 条

[1]

[Anonymous], PROC CVPR IEEE

[2]

[Anonymous], 2019, BuildingRESTAPIswithFlask: CreatePythonWebServiceswithMySQL, DOI DOI 10.1007/978-1

[3] CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF [J].

Bao, Linchao ;

Wu, Baoyuan ;

Liu, Wei .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5977-5986

[4] Tracking without bells and whistles [J].

Bergmann, Philipp ;

Meinhardt, Tim ;

Leal-Taixe, Laura .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :941-951

[5]

Bochinski Erik, 2017, 2017 14th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), DOI 10.1109/AVSS.2017.8078516

[6] One-Shot Video Object Segmentation [J].

Caelles, S. ;

Maninis, K. -K. ;

Pont-Tuset, J. ;

Leal-Taixe, L. ;

Cremers, D. ;

Van Gool, L. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329

[7]

Caelles Sergi, 2018, arXiv

[8] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[9] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[10] Attention to Scale: Scale-aware Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Yang, Yi ;

Wang, Jiang ;

Xu, Wei ;

Yuille, Alan L. .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3640-3649

← 1 2 3 4 5 6 7 →