Transformer-based Dual Relation Graph for Multi-label Image Recognition

被引：64

作者：

Zhao, Jiawei ^{[1
]}

Yan, Ke ^{[2
]}

Zhao, Yifan ^{[1
]}

Guo, Xiaowei ^{[2
]}

Huang, Feiyue ^{[2
]}

Li, Jia ^{[1
,3
]}

机构：

[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, SCSE, Beijing, Peoples R China

[2] Tencent Youtu Lab, Shanghai, Peoples R China

[3] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV48922.2021.00023

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The simultaneous recognition of multiple objects in one image remains a challenging task, spanning multiple events in the recognition field such as various object scales, inconsistent appearances, and confused inter-class relationships. Recent research efforts mainly resort to the statistic label co-occurrences and linguistic word embedding to enhance the unclear semantics. Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation, i.e., structural relation graph and semantic relation graph. The structural relation graph aims to capture long-range correlations from object context, by developing a cross-scale transformer-based architecture. The semantic graph dynamically models the semantic meanings of image objects with explicit semantic-aware constraints. In addition, we also incorporate the learnt structural relationship into the semantic graph, constructing a joint relation graph for robust representations. With the collaborative learning of these two effective relation graphs, our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks, i.e. MS-COCO and VOC 2007 dataset.

引用

页码：163 / 172

页数：10

共 46 条

[1]

Ankur J.U., 2016, P 2016 C EMP METH NA, P2249

[2]

Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13

[3] Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition [J].

Chen, Tianshui ;

Lin, Liang ;

Chen, Riquan ;

Hui, Xiaolu ;

Wu, Hefeng .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) :1371-1384

[4] Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition [J].

Chen, Tianshui ;

Xu, Muxin ;

Hui, Xiaolu ;

Wu, Hefeng ;

Lin, Liang .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :522-531

[5]

Chen TS, 2018, AAAI CONF ARTIF INTE, P6730

[6] Learning Graph Convolutional Networks for Multi-Label Recognition and Applications [J].

Chen, Zhao-Min ;

Wei, Xiu-Shen ;

Wang, Peng ;

Guo, Yanwen .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) :6969-6983

[7] Multi-Label Image Recognition with Graph Convolutional Networks [J].

Chen, Zhao-Min ;

Wei, Xiu-Shen ;

Wang, Peng ;

Guo, Yanwen .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5172-5181

[8] MULTI-LABEL IMAGE RECOGNITION WITH JOINT CLASS-AWARE MAP DISENTANGLING AND LABEL CORRELATION EMBEDDING [J].

Chen, Zhao-Min ;

Wei, Xiu-Shen ;

Jin, Xin ;

Guo, Yanwen .

2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, :622-627

[9]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[10]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

← 1 2 3 4 5 →