Contrastive and consistent feature learning for weakly supervised object localization and semantic segmentation

被引:5
作者
Ki, Minsong [1 ]
Uh, Youngjung [2 ]
Lee, Wonyoung [3 ]
Byun, Hyeran [1 ,3 ]
机构
[1] Yonsei Univ, Dept Comp Sci, Seoul, South Korea
[2] Yonsei Univ, Dept Appl Informat Engn, Seoul, South Korea
[3] Yonsei Univ, Grad Sch Artificial Intelligence, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Weakly supervised learning; Localization; Segmentation; Contrastive learning; Foreground consistency;
D O I
10.1016/j.neucom.2021.03.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised learning attempts to construct predictive models by learning with weak supervision. In this paper, we concentrate on weakly supervised object localization and semantic segmentation tasks. Existing methods are limited to focusing on narrow discriminative parts or overextending the activations to less discriminative regions even on backgrounds. To mitigate these problems, we regard the background as an important cue that guides the feature activation to cover the entire object to the right extent, and propose two novel objective functions: 1) contrastive attention loss and 2) foreground consistency loss. Contrastive attention loss draws the foreground feature and its dropped version close together and pushes the dropped foreground feature away from the background feature. Foreground consistency loss favors agreement between layers and provides early layers with a sense of objectness. Using both losses leads to balanced improvements over localization and segmentation accuracy by boosting activations on less discriminative regions but restraining the activation in the target object extent. For better optimizing the above losses, we use the non-local attention blocks to replace channel-pooled attention leading to enhanced attention maps considering the spatial similarity. Finally, our method achieves state-of-the-art localization performance on CUB-200-2011, ImageNet, and OpenImages benchmarks regarding top-1 localization accuracy, MaxBoxAccV2, and PxAP. We also demonstrate the effectiveness of our method in improving segmentation performance measured by mIoU on the PASCAL VOC dataset. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:244 / 254
页数:11
相关论文
共 46 条
[1]  
[Anonymous], 2010, International journal of computer vision, DOI DOI 10.1007/s11263-009-0275-4
[2]  
[Anonymous], 2006, PROC IEEE COMPUT SOC, DOI 10.1109/CVPR.2006.100
[3]   Rethinking Class Activation Mapping for Weakly Supervised Object Localization [J].
Bae, Wonho ;
Noh, Junhyug ;
Kim, Gunhee .
COMPUTER VISION - ECCV 2020, PT XV, 2020, 12360 :618-634
[4]  
Chen L. C., 2014, ICLR
[5]  
Chen T, 2020, PR MACH LEARN RES, V119
[6]   Beyond triplet loss: a deep quadruplet network for person re-identification [J].
Chen, Weihua ;
Chen, Xiaotang ;
Zhang, Jianguo ;
Huang, Kaiqi .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1320-1329
[7]   Artifact Suppressed Dictionary Learning for Low-Dose CT Image Processing [J].
Chen, Yang ;
Shi, Luyao ;
Feng, Qianjing ;
Yang, Jian ;
Shu, Huazhong ;
Luo, Limin ;
Coatrieux, Jean-Louis ;
Chen, Wufan .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2014, 33 (12) :2271-2292
[8]   Evaluating Weakly Supervised Object Localization Methods Right [J].
Choe, Junsuk ;
Oh, Seong Joon ;
Lee, Seungho ;
Chun, Sanghyuk ;
Akata, Zeynep ;
Shim, Hyunjung .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3130-3139
[9]   Attention-Based Dropout Layer for Weakly Supervised Single Object Localization and Semantic Segmentation [J].
Choe, Junsuk ;
Lee, Seungho ;
Shim, Hyunjung .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) :4256-4271
[10]   BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation [J].
Dai, Jifeng ;
He, Kaiming ;
Sun, Jian .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1635-1643