Learning to Segment via Cut-and-Paste

被引:60
作者
Remez, Tal [1 ]
Huang, Jonathan [2 ]
Brown, Matthew [2 ]
机构
[1] Google, Tel Aviv, Israel
[2] Google, Seattle, WA USA
来源
COMPUTER VISION - ECCV 2018, PT VII | 2018年 / 11211卷
关键词
Instance segmentation; Weakly-supervised; Deep-learning;
D O I
10.1007/978-3-030-01234-2_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a weakly-supervised approach to object instance segmentation. Starting with known or predicted object bounding boxes, we learn object masks by playing a game of cut-and-paste in an adversarial learning setup. A mask generator takes a detection box and Faster R-CNN features, and constructs a segmentation mask that is used to cut-and-paste the object into a new image location. The discriminator tries to distinguish between real objects, and those cut and pasted via the generator, giving a learning signal that leads to improved object masks. We verify our method experimentally using Cityscapes, COCO, and aerial image datasets, learning to segment objects without ever having seen a mask in training. Our method exceeds the performance of existing weakly supervised methods, without requiring hand-tuned segment proposals, and reaches 90% of supervised performance.
引用
收藏
页码:39 / 54
页数:16
相关论文
共 29 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]   Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes [J].
Abu Alhaija, Hassan ;
Mustikovela, Siva Karthik ;
Mescheder, Lars ;
Geiger, Andreas ;
Rother, Carsten .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2018, 126 (09) :961-972
[3]  
[Anonymous], 2017, IEEE ICC
[4]  
[Anonymous], 2016, P INT C NEUR INF PRO
[5]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[6]   BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation [J].
Dai, Jifeng ;
He, Kaiming ;
Sun, Jian .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1635-1643
[7]   Unsupervised Visual Representation Learning by Context Prediction [J].
Doersch, Carl ;
Gupta, Abhinav ;
Efros, Alexei A. .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1422-1430
[8]   Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection [J].
Dwibedi, Debidatta ;
Misra, Ishan ;
Hebert, Martial .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1310-1319
[9]  
Everingham M, 2012, The PAS- CAL. Visual Object Classes Challenge 2012 (VOC2012) Results
[10]  
Georgakis G., 2017, ROB SCI SYST 13