Spatial Context-Aware Object-Attentional Network for Multi-Label Image Classification

被引:20
作者
Zhang, Jialu [1 ,2 ]
Ren, Jianfeng [1 ]
Zhang, Qian [1 ]
Liu, Jiang [1 ,3 ]
Jiang, Xudong [4 ]
机构
[1] Univ Nottingham Ningbo China, Sch Comp Sci, Ningbo 315100, Zhejiang, Peoples R China
[2] Chinese Acad Sci, Cixi Inst Biomed Engn, Ningbo 315201, Zhejiang, Peoples R China
[3] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen 518055, Guangdong, Peoples R China
[4] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
基金
中国国家自然科学基金;
关键词
Semantics; Image classification; Task analysis; Feature extraction; Context modeling; Object detection; Correlation; Multi-label image classification; adaptive patch expansion; spatial context-aware object detection; object clustering; spatial semantic attention;
D O I
10.1109/TIP.2023.3266161
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-label image classification is a fundamental but challenging task in computer vision. To tackle the problem, the label-related semantic information is often exploited, but the background context and spatial semantic information of related objects are not fully utilized. To address these issues, a multi-branch deep neural network is proposed in this paper. The first branch is designed to extract the discriminant information from regions of interest to detect target objects. In the second branch, a spatial context-aware approach is proposed to better capture the contextual information of an object in its surroundings by using an adaptive patch expansion mechanism. It helps the detection of small objects that are easily lost without the support of context information. The third one, the object-attentional branch, exploits the spatial semantic relations between the target object and its related objects, to better detect partially occluded, small or dim objects with the support of those easily detectable objects. To better encode such relations, an attention mechanism jointly considering the spatial and semantic relations between objects is developed. Two widely used benchmark datasets for multi-labeling classification, MS COCO and PASCAL VOC, are used to evaluate the proposed framework. The experimental results demonstrate that the proposed method outperforms the state-of-the-art methods for multi-label image classification.
引用
收藏
页码:3000 / 3012
页数:13
相关论文
共 66 条
[1]  
Blaschke T, 2004, 2003 IEEE WORKSHOP ON ADVANCES IN TECHNIQUES FOR ANALYSIS OF REMOTELY SENSED DATA, P113
[2]   Salient Object Detection: A Benchmark [J].
Borji, Ali ;
Cheng, Ming-Ming ;
Jiang, Huaizu ;
Li, Jia .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5706-5722
[3]   An Object-Level High-Order Contextual Descriptor Based on Semantic, Spatial, and Scale Cues [J].
Cao, Xiaochun ;
Wei, Xingxing ;
Han, Yahong ;
Chen, Xiaowu .
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (07) :1327-1339
[4]   Robust one-stage object detection with location-aware classifiers [J].
Chen, Qiang ;
Wang, Peisong ;
Cheng, Anda ;
Wang, Wanguo ;
Zhang, Yifan ;
Cheng, Jian .
PATTERN RECOGNITION, 2020, 105
[5]   Multi-Label Image Recognition with Graph Convolutional Networks [J].
Chen, Zhao-Min ;
Wei, Xiu-Shen ;
Wang, Peng ;
Guo, Yanwen .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5172-5181
[6]   Semantic Correlation Promoted Shape-Variant Context for Segmentation [J].
Ding, Henghui ;
Jiang, Xudong ;
Shuai, Bing ;
Liu, Ai Qun ;
Wang, Gang .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :8877-8886
[7]   Semantic Segmentation With Context Encoding and Multi-Path Decoding [J].
Ding, Henghui ;
Jiang, Xudong ;
Shuai, Bing ;
Liu, Ai Qun ;
Wang, Gang .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :3520-3533
[8]   Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation [J].
Ding, Henghui ;
Jiang, Xudong ;
Shuai, Bing ;
Liu, Ai Qun ;
Wang, Gang .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2393-2402
[9]  
Dosovitskiy A., 2021, PROC ICLR, P1
[10]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338