Self Supervision for Attention Networks

被引：3

作者：

Patro, Badri N. ^{[1
,4
]}

Kasturi, G. S. ^{[2
,5
]}

Jain, Ansh ^{[2
]}

Namboodiri, Vinay P. ^{[3
]}

机构：

[1] IIT Kanpur, Kanpur, Uttar Pradesh, India

[2] NSUT, Delhi, India

[3] Univ Bath, Bath, Avon, England

[4] Google, Mountain View, CA 94043 USA

[5] Netaji Subhas Univ Technol, Delhi, India

来源：

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021) | 2021年

关键词：

D O I：

10.1109/WACV48630.2021.00077

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the attention mechanism has become a fairly popular concept and has proven to be successful in many machine learning applications. However, deep learning models do not employ supervision for these attention mechanisms which can improve the model's performance significantly. Therefore, in this paper, we tackle this limitation and propose a novel method to improve the attention mechanism by inducing "self-supervision". We devise a technique to generate desirable attention maps for any model that utilizes an attention module. This is achieved by examining the model's output for different regions sampled from the input and obtaining the attention probability distributions that enhance the proficiency of the model. The attention distributions thus obtained are used for supervision. We rely on the fact, that attenuation of the unimportant parts, allows a model to attend to more salient regions, thus strengthening the prediction accuracy. The quantitative and qualitative results published in this paper show that this method successfully improves the attention mechanism as well as the model's accuracy. In addition to the task of Visual Question Answering(VQA), we also show results on the task of Image classification and Text classification to prove that our method can be generalized to any vision and language model that uses an attention module.

引用

页码：726 / 735

页数：10

共 29 条

[1]

[Anonymous], 2017, P 34 INT C MACH LEAR

[2]

[Anonymous], 2017, ARXIV170403162

[3] VQA: Visual Question Answering [J].

Antol, Stanislaw ;

Agrawal, Aishwarya ;

Lu, Jiasen ;

Mitchell, Margaret ;

Batra, Dhruv ;

Zitnick, C. Lawrence ;

Parikh, Devi .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433

[4]

Chen JZ, 2016, PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), P551, DOI [10.1109/CIS.2016.133, 10.1109/CIS.2016.0134]

[5]

Chen Liqun, 2020, ICML 2020

[6] UNITER: UNiversal Image-TExt Representation Learning [J].

Chen, Yen-Chun ;

Li, Linjie ;

Yu, Licheng ;

El Kholy, Ahmed ;

Ahmed, Faisal ;

Gan, Zhe ;

Cheng, Yu ;

Liu, Jingjing .

COMPUTER VISION - ECCV 2020, PT XXX, 2020, 12375 :104-120

[7]

Das Abhishek, 2016, EMNLP, P932

[8] Attention Branch Network: Learning of Attention Mechanism for Visual Explanation [J].

Fukui, Hiroshi ;

Hirakawa, Tsubasa ;

Yamashita, Takayoshi ;

Fujiyoshi, Hironobu .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10697-10706

[9]

Goodfellow IJ, 2014, ADV NEURAL INFORM PR, V27, P2672, DOI DOI 10.1145/3422622

[10] A model of saliency-based visual attention for rapid scene analysis [J].

Itti, L ;

Koch, C ;

Niebur, E .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (11) :1254-1259

← 1 2 3 →