An automatic quality evaluator for video object segmentation masks

被引:5
作者
Cheng, Jingchun [1 ]
Song, Jiajie [1 ]
Xiong, Rui [1 ]
Pan, Xiong [1 ]
Zhang, Chunxi [1 ]
机构
[1] Beihang Univ, Beijing, Peoples R China
关键词
Mask quality estimation; Video object segmentation; Objective quality prediction; Deep learning;
D O I
10.1016/j.measurement.2022.111003
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Video object segmentation (VOS) has been a research hot-spot these years. However, evaluating the performance of different VOS methods requires labor-intensive and time-consuming manually labeled mask annotations, making it hard to validate the algorithm quality in field tests. In this paper, we tackle the problem of automatically measuring the mask quality for video object segmentation tasks without accessing manual annotations. We propose that with an elaborately designed network structure, we can extract quality sensitive features to predict mask quality scores without ground-truth labels. To achieve this, we train an end-to-end convolutional neural network to capture the quality-sensitive features with both spatial reference and temporal reference. In the proposed Video Object Segmentation Evaluation Network, the VOSE-Net, the corresponding video frame and motion amplitude information are used for spatial and temporal references respectively. Instead of directly concatenating features for mask and references, we extract spatial quality cues with feature correlation, which is more rational and effective in this specific task. Taking in the segmented mask, its corresponding frame image and optical flow map, the VOSE-Net can provide an accurate quality estimation without the need for human intervention. To train and verify the proposed network, we construct a new dataset by using the DAVIS video segmentation benchmark and results from many public video object segmentation algorithms. We also demonstrate the robustness and usefulness of the proposed method on several applications, i.e. proposal selection, parameter optimization, arbitrary video mask evaluation. The experimental results and analysis show that the VOSE-Net is fast, effective and of practical use.
引用
收藏
页数:11
相关论文
共 75 条
[1]   One-Shot Video Object Segmentation [J].
Caelles, S. ;
Maninis, K. -K. ;
Pont-Tuset, J. ;
Leal-Taixe, L. ;
Cremers, D. ;
Van Gool, L. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329
[2]   ResNet and Model Fusion for Automatic Spoofing Detection [J].
Chen, Zhuxin ;
Xie, Zhifeng ;
Zhang, Weibin ;
Xu, Xiangmin .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :102-106
[3]  
Cheng HK, 2021, ADV NEUR IN, V34
[4]   Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion [J].
Cheng, Ho Kei ;
Tai, Yu-Wing ;
Tang, Chi-Keung .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5555-5564
[5]   Fast and Accurate Online Video Object Segmentation via Tracking Parts [J].
Cheng, Jingchun ;
Tsai, Yi-Hsuan ;
Hung, Wei-Chih ;
Wang, Shengjin ;
Yang, Ming-Hsuan .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7415-7424
[6]   SegFlow: Joint Learning for Video Object Segmentation and Optical Flow [J].
Cheng, Jingchun ;
Tsai, Yi-Hsuan ;
Wang, Shengjin ;
Yang, Ming-Hsuan .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :686-695
[7]   What is a good evaluation measure for semantic segmentation? [J].
Csurka, Gabriela ;
Larlus, Diane ;
Perronnin, Florent .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,
[8]  
Dai JF, 2016, ADV NEUR IN, V29
[9]   FlowNet: Learning Optical Flow with Convolutional Networks [J].
Dosovitskiy, Alexey ;
Fischer, Philipp ;
Ilg, Eddy ;
Haeusser, Philip ;
Hazirbas, Caner ;
Golkov, Vladimir ;
van der Smagt, Patrick ;
Cremers, Daniel ;
Brox, Thomas .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2758-2766
[10]   The PASCAL Visual Object Classes Challenge: A Retrospective [J].
Everingham, Mark ;
Eslami, S. M. Ali ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) :98-136