Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

被引：63

作者：

Wang, Wen ^{[1
,4
]}

Cao, Yang ^{[1
,2
]}

Zhang, Jing ^{[3
]}

He, Fengxiang ^{[4
]}

Zha, Zheng-Jun ^{[1
]}

Wen, Yonggang ^{[5
]}

Tao, Dacheng ^{[4
]}

机构：

[1] Univ Sci & Technol China, Langfang, Hebei, Peoples R China

[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China

[3] Univ Sydney, Sydney, NSW, Australia

[4] JD Explore Acad, Beijing, Peoples R China

[5] Nanyang Technol Univ, Singapore, Singapore

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Object Detection; Detection Transformer; Domain Adaptation; Feature; Alignment; Matching Consistency;

D O I：

10.1145/3474085.3475317

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Detection transformers have recently shown promising object detection results and attracted increasing attention. However, how to develop effective domain adaptation techniques to improve its cross-domain performance remains unexplored and unclear. In this paper, we delve into this topic and empirically find that direct feature distribution alignment on the CNN backbone only brings limited improvements, as it does not guarantee domain-invariant sequence features in the transformer for prediction. To address this issue, we propose a novel Sequence Feature Alignment (SFA) method that is specially designed for the adaptation of detection transformers. Technically, SFA consists of a domain query-based feature alignment (DQFA) module and a token-wise feature alignment (TDA) module. In DQFA, a novel domain query is used to aggregate and align global context from the token sequence of both domains. DQFA reduces the domain discrepancy in global feature representations and object relations when deploying in the transformer encoder and decoder, respectively. Meanwhile, TDA aligns token features in the sequence from both domains, which reduces the domain gaps in local and instance-level feature representations in the transformer encoder and decoder, respectively. Besides, a novel bipartite matching consistency loss is proposed to enhance the feature discriminability for robust object detection. Experiments on three challenging benchmarks show that SFA outperforms stateof-the-art domain adaptive object detection methods. Code has been made available at: https://github.com/encounter1997/SFA.

引用

页码：1730 / 1738

页数：9

共 50 条

[1]

Bartlett PL, 2017, 31 ANN C NEURAL INFO, V30

[2] A theory of learning from different domains [J].

Ben-David, Shai ;

Blitzer, John ;

Crammer, Koby ;

Kulesza, Alex ;

Pereira, Fernando ;

Vaughan, Jennifer Wortman .

MACHINE LEARNING, 2010, 79 (1-2) :151-175

[3] Exploring Object Relation in Mean Teacher for Cross-Domain Detection [J].

Cai, Qi ;

Pan, Yingwei ;

Ngo, Chong-Wah ;

Tian, Xinmei ;

Duan, Lingyu ;

Yao, Ting .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11449-11458

[4]

Carion N, 2020, EUR C COMP VIS, P213

[5] Domain Adaptive Faster R-CNN for Object Detection in the Wild [J].

Chen, Yuhua ;

Li, Wen ;

Sakaridis, Christos ;

Dai, Dengxin ;

Van Gool, Luc .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3339-3348

[6]

Cheng-Chun Hsu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12354), P733, DOI 10.1007/978-3-030-58545-7_42

[7] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[8]

Dai Zhigang, 2020, P IEEE C COMP VIS PA

[9]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[10]

Ding Liang, 2020, ACL

← 1 2 3 4 5 →