Double Attention for Multi-Label Image Classification

被引：11

作者：

Zhao, Haiying ^{[1
]}

Zhou, Wei ^{[2
]}

Hou, Xiaogang ^{[1
]}

Zhu, Hui ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Natl Pilot Software Engn Sch, Sch Comp Sci, Beijing 100876, Peoples R China

[2] Beijing Univ Posts & Telecommun, Sch Digital Media & Design Art, Beijing 100876, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Correlation; Feature extraction; Task analysis; Image classification; Semantics; Spatial resolution; Predictive models; Multi-label classification; multi-scale features; attention mechanism; label correlation;

D O I：

10.1109/ACCESS.2020.3044446

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-label image classification is an essential task in image processing. How to improve the correlation between labels by learning multi-scale features from images is a very challenging problem. We propose a Double Attention Network (DAN) to improve the correlation between image feature regions and labels, as well as between labels and labels. Firstly, the dynamic learning strategy is used to extract the multi-scale features of the image to solve the problem of inconsistent scale of objects in the image. Secondly, in order to improve the correlation between the image feature regions and the labels, we use the spatial attention module to focus on the important regions of the image to learn their salient features, while we use the channel attention module to model the correlation between the channels to improve the correlation between the labels. Finally, the output features of two attention modules are fused as one multi-label image classification model. Experiments on MS-COCO 2014 dataset, Pascal VOC 2007 dataset and NUS-WIDE dataset demonstrate that our model is significantly better than the state-of-the-art models. Besides, visualization analyses show that our model has a strong ability for image salient feature learning and label correlation capturing.

引用

页码：225539 / 225550

页数：12

共 47 条

[1] [Anonymous], 2018, OCNet: Object Context Network for Scene Parsing
[2] [Anonymous], 2010, INT J COMPUT VISION, DOI DOI 10.1007/s11263-009-0275-4
[3] [Anonymous], 2013, ARXIV13124894
[4] MUREL: Multimodal Relational Reasoning for Visual Question Answering
Cadene, Remi
Ben-younes, Hedi
Cord, Matthieu
Thome, Nicolas
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1989 - 1998
[5] Cevikalp H., 2019, P IEEE C COMP VIS PA, P9
[6] Chua T.-S., 2009, P ACM INT C IM VID R, P48
[7] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8] WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation
Durand, Thibaut
Mordan, Taylor
Thome, Nicolas
Cord, Matthieu
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5957 - 5966
[9] Feng J, 2017, AAAI CONF ARTIF INTE, P1884
[10] Dual Attention Network for Scene Segmentation
Fu, Jun
Liu, Jing
Tian, Haijie
Li, Yong
Bao, Yongjun
Fang, Zhiwei
Lu, Hanqing
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149

← 1 2 3 4 5 →