Exploring Human-Like Attention Supervision in Visual Question Answering

被引：0

作者：

Qiao, Tingting ^{[1
]}

Dong, Jianfeng ^{[1
]}

Xu, Duanqing ^{[1
]}

机构：

[1] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China

来源：

THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2018年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Attention mechanisms have been widely applied in the Visual Question Answering (VQA) task, as they help to focus on the area-of-interest of both visual and textual information. To answer the questions correctly, the model needs to selectively target different areas of an image, which suggests that an attention-based model may benefit from an explicit attention supervision. In this work, we aim to address the problem of adding attention supervision to VQA models. Since there is a lack of human attention data, we first propose a Human Attention Network (HAN) to generate human-like attention maps, training on a recently released dataset called Human ATtention Dataset (VQA-HAT). Then, we apply the pre-trained HAN on the VQA v2.0 dataset to automatically produce the human-like attention maps for all image-question pairs. The generated human-like attention map dataset for the VQA v2.0 dataset is named as Human-Like ATtention (HLAT) dataset. Finally, we apply human-like attention supervision to an attention-based VQA model. The experiments show that adding human-like supervision yields a more accurate attention together with a better performance, showing a promising future for human-like attention supervision in VQA.

引用

页码：7300 / 7307

页数：8

共 24 条

[11]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[12]

Fukui A., 2016, ARXIV160601847, P457, DOI DOI 10.18653/V1/D16-1044

[13] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[14]

Jiang M, 2015, PROC CVPR IEEE, P1072, DOI 10.1109/CVPR.2015.7298710

[15]

Kim JH, 2016, ADV NEUR IN, V29

[16]

Kiros Ryan., 2015, ADV NEURAL INFORM PR, P3294

[17]

Liu F, 2017, I NAVIG SAT DIV INT, P4176

[18]

Lu JH, 2016, PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING TECHNOLOGY (CSET2015), MEDICAL SCIENCE AND BIOLOGICAL ENGINEERING (MSBE2015), P289

[19] Ask Your Neurons: A Neural-based Approach to Answering Questions about Images [J].

Malinowski, Mateusz ;

Rohrbach, Marcus ;

Fritz, Mario .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1-9

[20] Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction [J].

Noh, Hyeonwoo ;

Seo, Paul Hongsuck ;

Han, Bohyung .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :30-38

← 1 2 3 →