Transformer-based Dual Relation Graph for Multi-label Image Recognition

被引:64
作者
Zhao, Jiawei [1 ]
Yan, Ke [2 ]
Zhao, Yifan [1 ]
Guo, Xiaowei [2 ]
Huang, Feiyue [2 ]
Li, Jia [1 ,3 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, SCSE, Beijing, Peoples R China
[2] Tencent Youtu Lab, Shanghai, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.00023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The simultaneous recognition of multiple objects in one image remains a challenging task, spanning multiple events in the recognition field such as various object scales, inconsistent appearances, and confused inter-class relationships. Recent research efforts mainly resort to the statistic label co-occurrences and linguistic word embedding to enhance the unclear semantics. Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation, i.e., structural relation graph and semantic relation graph. The structural relation graph aims to capture long-range correlations from object context, by developing a cross-scale transformer-based architecture. The semantic graph dynamically models the semantic meanings of image objects with explicit semantic-aware constraints. In addition, we also incorporate the learnt structural relationship into the semantic graph, constructing a joint relation graph for robust representations. With the collaborative learning of these two effective relation graphs, our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks, i.e. MS-COCO and VOC 2007 dataset.
引用
收藏
页码:163 / 172
页数:10
相关论文
共 46 条
[1]  
Ankur J.U., 2016, P 2016 C EMP METH NA, P2249
[2]  
Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[3]   Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition [J].
Chen, Tianshui ;
Lin, Liang ;
Chen, Riquan ;
Hui, Xiaolu ;
Wu, Hefeng .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) :1371-1384
[4]   Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition [J].
Chen, Tianshui ;
Xu, Muxin ;
Hui, Xiaolu ;
Wu, Hefeng ;
Lin, Liang .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :522-531
[5]  
Chen TS, 2018, AAAI CONF ARTIF INTE, P6730
[6]   Learning Graph Convolutional Networks for Multi-Label Recognition and Applications [J].
Chen, Zhao-Min ;
Wei, Xiu-Shen ;
Wang, Peng ;
Guo, Yanwen .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) :6969-6983
[7]   Multi-Label Image Recognition with Graph Convolutional Networks [J].
Chen, Zhao-Min ;
Wei, Xiu-Shen ;
Wang, Peng ;
Guo, Yanwen .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5172-5181
[8]   MULTI-LABEL IMAGE RECOGNITION WITH JOINT CLASS-AWARE MAP DISENTANGLING AND LABEL CORRELATION EMBEDDING [J].
Chen, Zhao-Min ;
Wei, Xiu-Shen ;
Jin, Xin ;
Guo, Yanwen .
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, :622-627
[9]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171