Learning What to Learn for Video Object Segmentation

被引：131

作者：

Bhat, Goutam ^{[1
]}

Lawin, Felix Jaremo ^{[2
]}

Danelljan, Martin ^{[1
]}

Robinson, Andreas ^{[2
]}

Felsberg, Michael ^{[2
]}

Van Gool, Luc ^{[1
]}

Timofte, Radu ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, CVL, Zurich, Switzerland

[2] Linkoping Univ, CVL, Linkoping, Sweden

来源：

COMPUTER VISION - ECCV 2020, PT II | 2020年 / 12347卷

关键词：

D O I：

10.1007/978-3-030-58536-5_46

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video object segmentation (VOS) is a highly challenging problem, since the target object is only defined by a first-frame reference mask during inference. The problem of how to capture and utilize this limited information to accurately segment the target remains a fundamental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learner. Our learner is designed to predict a powerful parametric model of the target by minimizing a segmentation error in the first frame. We further go beyond the standard few-shot learning paradigm by learning what our target model should learn in order to maximize segmentation accuracy. We perform extensive experiments on standard benchmarks. Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding to a 2.6% relative improvement over the previous best result. The code and models are available at https://github.com/visionml/pytracking.

引用

页码：777 / 794

页数：18

共 40 条

[1]

Behl H.S., 2018, NEURIPS 2019 WORKSH

[2] The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks [J].

Berman, Maxim ;

Triki, Amal Rannen ;

Blaschko, Matthew B. .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4413-4421

[3]

Bertinetto Luca, 2019, 7 INT C LEARNING REP

[4] Learning Discriminative Model Prediction for Tracking [J].

Bhat, Goutam ;

Danelljan, Martin ;

Van Gool, Luc ;

Timofte, Radu .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6181-6190

[5] One-Shot Video Object Segmentation [J].

Caelles, S. ;

Maninis, K. -K. ;

Pont-Tuset, J. ;

Leal-Taixe, L. ;

Cremers, D. ;

Van Gool, L. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329

[6] Deep Meta Learning for Real-Time Target-Aware Visual Tracking [J].

Choi, Janghoon ;

Kwon, Junseok ;

Lee, Kyoung Mu .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :911-920

[7]

Cohen I., 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), P319, DOI 10.1109/CVPR.1999.784651

[8] Probabilistic Regression for Visual Tracking [J].

Danelljan, Martin ;

Van Gool, Luc ;

Timofte, Radu .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :7181-7190

[9]

Erdélyi A, 2014, 2014 11TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), P44, DOI 10.1109/AVSS.2014.6918642

[10]

Finn C, 2017, PR MACH LEARN RES, V70

← 1 2 3 4 →