Learning to Detect Salient Objects with Image-level Supervision

被引:878
作者
Wang, Lijun [1 ]
Lu, Huchuan [1 ]
Wang, Yifan [1 ]
Feng, Mengyang [1 ]
Wang, Dong [1 ]
Yin, Baocai [1 ]
Ruan, Xiang [2 ]
机构
[1] Dalian Univ Technol, Dalian, Peoples R China
[2] Tiwaki Co Ltd, Tokyo, Japan
来源
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年
关键词
D O I
10.1109/CVPR.2017.404
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep Neural Networks (DNNs) have substantially improved the state-of-the-art in salient object detection. However, training DNNs requires costly pixel-level annotations. In this paper, we leverage the observation that imagelevel tags provide important cues of foreground salient objects, and develop a weakly supervised learning method for saliency detection using image-level tags only. The Foreground Inference Network (FIN) is introduced for this challenging task. In the first stage of our training method, FIN is jointly trained with a fully convolutional network (FCN) for image-level tag prediction. A global smooth pooling layer is proposed, enabling FCN to assign object category tags to corresponding object regions, while FIN is capable of capturing all potential foreground regions with the predicted saliency maps. In the second stage, FIN is fine-tuned with its predicted saliency maps as ground truth. For refinement of ground truth, an iterative Conditional Random Field is developed to enforce spatial label consistency and further boost performance. Our method alleviates annotation efforts and allows the usage of existing large scale training sets with image-level tags. Our model runs at 60 FPS, outperforms unsupervised ones with a large margin, and achieves comparable or even superior performance than fully supervised counterparts.
引用
收藏
页码:3796 / 3805
页数:10
相关论文
共 59 条
[11]   Structured Forests for Fast Edge Detection [J].
Dollar, Piotr ;
Zitnick, C. Lawrence .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :1841-1848
[12]  
Duchi J., 2008, P 25 INT C MACH LEAR, P272, DOI DOI 10.1145/1390156.1390191
[13]   Cluster-Based Co-Saliency Detection [J].
Fu, Huazhu ;
Cao, Xiaochun ;
Tu, Zhuowen .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (10) :3766-3778
[14]  
Girshick R., 2014, IEEE C COMP VIS PATT, DOI [DOI 10.1109/CVPR.2014.81, 10.1109/CVPR.2014.81]
[15]  
HARIHARAN B, 2016, LOW SHOT VISUAL OBJE
[16]  
He K., 2016, P IEEE C COMPUTER VI, P770, DOI DOI 10.1109/CVPR.2016.90
[17]  
Ioffe Sergey, 2015, PROC INT C MACH LEAR, V37, P448, DOI DOI 10.48550/ARXIV.1502.03167
[18]  
JIANG H, 2015, WEAKLY SUPERVISED LE
[19]   Salient Object Detection: A Discriminative Regional Feature Integration Approach [J].
Jiang, Huaizu ;
Wang, Jingdong ;
Yuan, Zejian ;
Wu, Yang ;
Zheng, Nanning ;
Li, Shipeng .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2083-2090
[20]  
KHOREVA A, 2015, WEAKLY SUPERVISED OB