DATR: Unsupervised Domain Adaptive Detection Transformer With Dataset-Level Adaptation and Prototypical Alignment

被引:0
作者
Chen, Liang [1 ,2 ,3 ]
Han, Jianhong [1 ,2 ,3 ]
Wang, Yupei [1 ,2 ,3 ]
机构
[1] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China
[2] Chongqing Innovat Ctr, Beijing Inst Technol, Chongqing 401135, Peoples R China
[3] Natl Key Lab Space Born Intelligent Informat Proc, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Unsupervised domain adaptation; object detection;
D O I
10.1109/TIP.2025.3527370
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the success of the DEtection TRansformer (DETR), numerous researchers have explored its effectiveness in addressing unsupervised domain adaptation tasks. Existing methods leverage carefully designed feature alignment techniques to align the backbone or encoder, yielding promising results. However, effectively aligning instance-level features within the unique decoder structure of the detector has largely been neglected. Related techniques primarily align instance-level features in a class-agnostic manner, overlooking distinctions between features from different categories, which results in only limited improvements. Furthermore, the scope of current alignment modules in the decoder is often restricted to a limited batch of images, failing to capture the dataset-level cues, thereby severely constraining the detector's generalization ability to the target domain. To this end, we introduce a strong DETR-based detector named Domain Adaptive detection TRansformer (DATR) for unsupervised domain adaptation of object detection. First, we propose the Class-wise Prototypes Alignment (CPA) module, which effectively aligns cross-domain features in a class-aware manner by bridging the gap between the object detection task and the domain adaptation task. Then, the designed Dataset-level Alignment Scheme (DAS) explicitly guides the detector to achieve global representation and enhance inter-class distinguishability of instance-level features across the entire dataset, which spans both domains, by leveraging contrastive learning. Moreover, DATR incorporates a mean-teacher-based self-training framework, utilizing pseudo-labels generated by the teacher model to further mitigate domain bias. Extensive experimental results demonstrate superior performance and generalization capabilities of our proposed DATR in multiple domain adaptation scenarios. Code is released at https://github.com/h751410234/DATR.
引用
收藏
页码:982 / 994
页数:13
相关论文
共 58 条
  • [1] Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, DOI 10.48550/ARXIV.2004.10934]
  • [2] Contrastive Mean Teacher for Domain Adaptive Object Detectors
    Cao, Shengcao
    Joshi, Dhiraj
    Gui, Liang-Yan
    Wang, Yu-Xiong
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23839 - 23848
  • [3] Label Matching Semi-Supervised Object Detection
    Chen, Binbin
    Chen, Weijie
    Yang, Shicai
    Xuan, Yunyi
    Song, Jie
    Xie, Di
    Pu, Shiliang
    Song, Mingli
    Zhuang, Yueting
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14361 - 14370
  • [4] Domain Adaptive Faster R-CNN for Object Detection in the Wild
    Chen, Yuhua
    Li, Wen
    Sakaridis, Christos
    Dai, Dengxin
    Van Gool, Luc
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3339 - 3348
  • [5] Cheng-Chun Hsu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12354), P733, DOI 10.1007/978-3-030-58545-7_42
  • [6] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [7] Cui Yiming, 2024, MM '24: Proceedings of the 32nd ACM International Conference on Multimedia, P1331, DOI 10.1145/3664647.3680899
  • [8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [9] Dosovitskiy A., 2021, ICLR, P1
  • [10] Ganin Y, 2015, PR MACH LEARN RES, V37, P1180