Consistency-aware Domain Adaptive Object Detection via Orthogonal Disentangling and Contrastive Learning

被引:0
作者
Zhong A.-Y. [1 ,3 ]
Wang R. [1 ,2 ,3 ]
Zhang H. [1 ,3 ]
Zou C. [1 ,3 ]
Jing L.-H. [1 ,3 ]
机构
[1] State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing
[2] Zhejiang Lab, Hangzhou
[3] School of Cyber Security, University of Chinese Academy of Sciences, Beijing
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2023年 / 46卷 / 04期
基金
中国国家自然科学基金;
关键词
contrastive learning; deep learning; domain adaptation; feature disentangling; object detection;
D O I
10.11897/SP.J.1016.2023.00827
中图分类号
学科分类号
摘要
Traditional object detection methods suffer from performance degradation when the training and test data are from different domains, for example, photos from a sunny day and a cloudy day are two different domains,and an object detection model trained on a sunny day usually performance not well on a cloudy day. This is caused by the domain shift between two domains. Collecting data for every single domain is time-consuming and laborious,which will increase the cost of model deployment and reduce the efficiency of the model being used. Aiming at this problem, the domain adaptive object detection method is proposed. Most domain adaptation methods eliminate domain shifts by finding domain-invariant feature representations in two domains. Although existing domain adaptation methods have achieved great success, there are still differences between the domain-invariant features extracted from the source domain and the target domain, which lead to poor performance when the model uses domain-invariant features from the target domain. Enlightened by the idea of strengthening the semantic consistency of features to obtain better domain-invariant features, this paper proposes a Consistency-aware Domain Adaptive object detection network (ConDA) with orthogonal disentangling and contrastive learning. Specifically, this paper first proposes an orthogonal relation consistency constraint based on the orthogonal disentangled features, which can better improve the intra-domain consistency and the transferability of the model. Orthogonal constraints are applied in the feature disentangling process to keep domain-invariant and domain-specific features different. Based on the orthogonal constraints,the relationship consistency loss can be applied by calculating the instance-level feature relationship consistency before and after feature disentangling and then constraining them to be the same. This loss can retain the semantic information during feature disentangling and strengthen the intra-domain consistency of the feature, thus improving the transferability of the model. Furthermore,in order to strengthen the inter-domain consistency of features, this paper proposes a contrastive learning branch with pseudo labels. This paper uses pseudo label on the detection results of the target domain with high confidence and then aligns instance-level features from different domains with contrastive learning, which can reduce the domain shift between same class instance-level features in different domains and align instance-level domain-invariant features from different domains, and improve the inter-domain consistency. In addition, this paper also adds a target-domain-like dataset generated by CycleGAN to the source domain dataset to reduce the domain shift between the source domain and the target domain, which helps improve the model detection results. To verify the method proposed in this paper,this paper tests the method on a pair of datasets that use Cityscapes as the source domain and FoggyCityscapes as the target domain,which is commonly used in this field. Compared with the baseline method Instance Invariant Domain Adaptive Object Detection (IIOD),this method has achieved a mean Accuracy Precision(mAP)improvement of 3. 1%,of which the improvement on some specific subclasses is up to 6%;Compared with other latest methods in the field,the mAP is improved by about 1%. This paper also tests the method on two other pairs of datasets, and the results show that the method can also achieve a good result on other datasets. © 2023 Science Press. All rights reserved.
引用
收藏
页码:827 / 842
页数:15
相关论文
共 60 条
[1]  
Antonio Torralba, Efros Alexei A., Unbiased look at dataset bias, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1521-1528, (2011)
[2]  
Girshick Ross B., Fast R-CNN, Proceedings of the IEEE International Conference on Computer Vision, pp. 1440-1448, (2015)
[3]  
Shaoqing Ren, Kaiming He, Girshick Ross B., Jian Sun, Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 6, pp. 1137-1149, (2017)
[4]  
Joseph Redmon, Divvala Santosh Kumar, Ali Farhadi, You Only Look Once:Unified,Real-Time Object Detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, (2016)
[5]  
Kaiming He, Georgia Gkioxari, Piotr Dollar, Girshick Ross B., Mask R-CNN, Proceedings of the IEEE International Conference on Computer Vision, pp. 2980-2988, (2017)
[6]  
Jialin Pan Sinno, Yang Qiang, A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering, 22, 10, pp. 1345-1359, (2010)
[7]  
Hong Dai, Ting Hao Xuan, Jie Sheng Li, Guang Miao Qi, Domain Adaptation Algorithm for Few-Shot Classification Task, Chinese Journal of Computers, 45, pp. 935-950, (2022)
[8]  
Yuhua Chen, Li Wen, Christos Sakaridis, Dengxin Dai, Van Gool Luc, Domain Adaptive Faster R-CNN for Object Detection in the Wild, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3339-3348, (2018)
[9]  
Boqing Gong, Shi Yuan, Sha Fei, Kristen Grauman, Geodesic flow kernel for unsupervised domain adaptation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2066-2073, (2012)
[10]  
Basura Fernando, Amaury Habrard, Marc Sebban, Tinne Tuytelaars, Unsupervised Visual Domain Adaptation Using Subspace Alignment, Proceedings of the IEEE International Conference on Computer Vision, pp. 2960-2967, (2013)