Learning What to Learn for Video Object Segmentation

被引:131
作者
Bhat, Goutam [1 ]
Lawin, Felix Jaremo [2 ]
Danelljan, Martin [1 ]
Robinson, Andreas [2 ]
Felsberg, Michael [2 ]
Van Gool, Luc [1 ]
Timofte, Radu [1 ]
机构
[1] Swiss Fed Inst Technol, CVL, Zurich, Switzerland
[2] Linkoping Univ, CVL, Linkoping, Sweden
来源
COMPUTER VISION - ECCV 2020, PT II | 2020年 / 12347卷
关键词
D O I
10.1007/978-3-030-58536-5_46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video object segmentation (VOS) is a highly challenging problem, since the target object is only defined by a first-frame reference mask during inference. The problem of how to capture and utilize this limited information to accurately segment the target remains a fundamental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learner. Our learner is designed to predict a powerful parametric model of the target by minimizing a segmentation error in the first frame. We further go beyond the standard few-shot learning paradigm by learning what our target model should learn in order to maximize segmentation accuracy. We perform extensive experiments on standard benchmarks. Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding to a 2.6% relative improvement over the previous best result. The code and models are available at https://github.com/visionml/pytracking.
引用
收藏
页码:777 / 794
页数:18
相关论文
共 40 条
[1]  
Behl H.S., 2018, NEURIPS 2019 WORKSH
[2]   The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks [J].
Berman, Maxim ;
Triki, Amal Rannen ;
Blaschko, Matthew B. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4413-4421
[3]  
Bertinetto Luca, 2019, 7 INT C LEARNING REP
[4]   Learning Discriminative Model Prediction for Tracking [J].
Bhat, Goutam ;
Danelljan, Martin ;
Van Gool, Luc ;
Timofte, Radu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6181-6190
[5]   One-Shot Video Object Segmentation [J].
Caelles, S. ;
Maninis, K. -K. ;
Pont-Tuset, J. ;
Leal-Taixe, L. ;
Cremers, D. ;
Van Gool, L. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329
[6]   Deep Meta Learning for Real-Time Target-Aware Visual Tracking [J].
Choi, Janghoon ;
Kwon, Junseok ;
Lee, Kyoung Mu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :911-920
[7]  
Cohen I., 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), P319, DOI 10.1109/CVPR.1999.784651
[8]   Probabilistic Regression for Visual Tracking [J].
Danelljan, Martin ;
Van Gool, Luc ;
Timofte, Radu .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :7181-7190
[9]  
Erdélyi A, 2014, 2014 11TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), P44, DOI 10.1109/AVSS.2014.6918642
[10]  
Finn C, 2017, PR MACH LEARN RES, V70