Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation

被引:118
作者
Bertasius, Gedas [1 ]
Torresani, Lorenzo [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年
关键词
D O I
10.1109/CVPR42600.2020.00976
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a method for simultaneously classifying, segmenting and tracking object instances in a video sequence. Our method, named MaskProp, adapts the popular Mask R-CNN to video by adding a mask propagation branch that propagates frame-level object instance masks from each video frame to all the other frames in a video clip. This allows our system to predict clip-level instance tracks with respect to the object instances segmented in the middle frame of the clip. Clip-level instance tracks generated densely for each frame in the sequence are finally aggregated to produce video-level object instance segmentation and classification. Our experiments demonstrate that our clip-level instance segmentation makes our approach robust to motion blur and object occlusions in video. MaskProp achieves the best reported accuracy on the YouTube-VIS dataset, outperforming the ICCV 2019 video instance segmentation challenge winner despite being much simpler and using orders of magnitude less labeled data (1.3M vs 1B images and 860K vs 14M bounding boxes). The project page is at: https://gberta.github.io/maskprop/.
引用
收藏
页码:9736 / 9745
页数:10
相关论文
共 47 条
[1]  
[Anonymous], fied image classification, object detection, and visual rela
[2]  
[Anonymous], CVPR
[3]  
[Anonymous], 2017, CVPR, DOI DOI 10.1109/CVPR.2017.774
[4]   Pixelwise Instance Segmentation with a Dynamically Instantiated Network [J].
Arnab, Anurag ;
Torr, Philip H. S. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :879-888
[5]   Deep Watershed Transform for Instance Segmentation [J].
Bai, Min ;
Urtasun, Raquel .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2858-2866
[6]   Object Detection in Video with Spatiotemporal Sampling Networks [J].
Bertasius, Gedas ;
Torresani, Lorenzo ;
Shi, Jianbo .
COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 :342-357
[7]   Semantic Segmentation with Boundary Neural Fields [J].
Bertasius, Gedas ;
Shi, Jianbo ;
Torresani, Lorenzo .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3602-3610
[8]  
Bertasius Gedas, 2019, ADV NEURAL INFORM PR, V33, P6
[9]   One-Shot Video Object Segmentation [J].
Caelles, S. ;
Maninis, K. -K. ;
Pont-Tuset, J. ;
Leal-Taixe, L. ;
Cremers, D. ;
Van Gool, L. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329
[10]  
Cai MY, 2017, 2017 IEEE INTERNATIONAL SYMPOSIUM ON SYSTEMS ENGINEERING (ISSE 2017), P6, DOI 10.1109/SysEng.2017.8088250