Where to Look?: Mining Complementary Image Regions for Weakly Supervised Object Localization

被引：22

作者：

Babar, Sadbhavana ^{[1
]}

Das, Sukhendu ^{[1
]}

机构：

[1] IIT Madras, Dept Comp Sci & Engn, Visualizat & Percept Lab, Chennai, Tamil Nadu, India

来源：

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021) | 2021年

关键词：

D O I：

10.1109/WACV48630.2021.00105

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Humans possess an innate capability of recognizing objects and their corresponding parts and confine their attention to that location in a visual scene where the object is spatially present. Recently, efforts to train machines to mimic this ability of humans in the form of weakly supervised object localization, using training labels only at the image-level, have garnered a lot of attention. Nonetheless, one of the well-known problems that most of the existing methods suffer from is localizing only the most discriminative part of an object. Such methods provide very little or no focus on other pertinent parts of the object. In this paper, we propose a novel way of scrupulously localizing objects using training with labels as for the entire image by mining information from complementary regions in an image. Primarily, we adapt to regional dropout at complementary spatial locations to create two intermediate images. With the help of a novel Channel-wise Assisted Attention Module (CAAM) coupled with a Spatial Self-Attention Module (SSAM), we parallely train our model to leverage the information from complementary image regions for excellent localization. Finally, we fuse the attention maps generated by the two classifiers using our Attention-based Fusion Loss. Several experimental studies manifest the superior performance of our proposed approach. Our method demonstrates a significant increase in localization performance over the existing state-of-the-art methods on CUB-200-2011 and ILSVRC 2016 datasets.

引用

页码：1009 / 1018

页数：10

共 52 条

[1] Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations
Ahn, Jiwoon
Cho, Sunghyun
Kwak, Suha
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2204 - 2213
[2] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Anderson, Peter
He, Xiaodong
Buehler, Chris
Teney, Damien
Johnson, Mark
Gould, Stephen
Zhang, Lei
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
[3] Objects that Sound
Arandjelovic, Relja
Zisserman, Andrew
[J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 451 - 466
[4] Self-Taught Object Localization with Deep Networks
Bazzani, Loris
Bergamo, Alessandro
Anguelov, Dragomir
Torresani, Lorenzo
[J]. 2016 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2016), 2016,
[5] Benenson R., 2019, CVPR, P11700
[6] Evaluating Weakly Supervised Object Localization Methods Right
Choe, Junsuk
Oh, Seong Joon
Lee, Seungho
Chun, Sanghyuk
Akata, Zeynep
Shim, Hyunjung
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3130 - 3139
[7] Attention-based Dropout Layer for Weakly Supervised Object Localization
Choe, Junsuk
Shim, Hyunjung
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2214 - 2223
[8] DeVries T., 2017, Improved regulariza
[9] Weakly Supervised Cascaded Convolutional Networks
Diba, Ali
Sharma, Vivek
Pazandeh, Ali
Pirsiavash, Hamed
Van Gool, Luc
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5131 - 5139
[10] CenterNet: Keypoint Triplets for Object Detection
Duan, Kaiwen
Bai, Song
Xie, Lingxi
Qi, Honggang
Huang, Qingming
Tian, Qi
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6568 - 6577

← 1 2 3 4 5 6 →