Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation

被引:9
作者
Ba Hung Ngo [1 ]
Nhat-Tuong Do-Tran [2 ]
Tuan-Ngoc Nguyen [3 ]
Hae-Gon Jeon [4 ]
Tae Jong Choi [1 ]
机构
[1] Chonnam Natl Univ, Grad Sch Data Sci, Gwangju, South Korea
[2] Natl Yang Ming Chiao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan
[3] FPT Telecom, Digital Transformat Ctr, Hanoi, Vietnam
[4] GIST, AI Grad Sch, Gwangju, South Korea
来源
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/CVPR52733.2024.02697
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most domain adaptation (DA) methods are based on either a convolutional neural networks (CNNs) or a vision transformers (ViTs). They align the distribution differences between domains as encoders without considering their unique characteristics. For instance, ViT excels in accuracy due to its superior ability to capture global representations, while CNN has an advantage in capturing local representations. This fact has led us to design a hybrid method to fully take advantage of both ViT and CNN, called Explicitly Class-specific Boundaries (ECB). ECB learns CNN on ViT to combine their distinct strengths. In particular, we leverage ViT's properties to explicitly find class-specific decision boundaries by maximizing the discrepancy between the outputs of the two classifiers to detect target samples far from the source support. In contrast, the CNN encoder clusters target features based on the previously defined class-specific boundaries by minimizing the discrepancy between the probabilities of the two classifiers. Finally, ViT and CNN mutually exchange knowledge to improve the quality of pseudo labels and reduce the knowledge discrepancies of these models. Compared to conventional DA methods, our ECB achieves superior performance, which verifies its effectiveness in this hybrid model. The project website can be found here.
引用
收藏
页码:28545 / 28554
页数:10
相关论文
共 38 条
[1]  
[Anonymous], 2019, P 28 NAT AC C STRUCT
[2]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[3]   Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation [J].
Chen, Lin ;
Chen, Huaian ;
Wei, Zhixiang ;
Jin, Xin ;
Tan, Xiao ;
Jin, Yi ;
Chen, Enhong .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :7171-7180
[4]  
Cubuk Ekin D, 2020, IEEE C COMP VIS PATT
[5]   Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations [J].
Cui, Shuhao ;
Wang, Shuhui ;
Zhuo, Junbao ;
Li, Liang ;
Huang, Qingming ;
Tian, Qi .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3940-3949
[6]   Gradually Vanishing Bridge for Adversarial Domain Adaptation [J].
Cui, Shuhao ;
Wang, Shuhui ;
Zhuo, Junbao ;
Su, Chi ;
Huang, Qingming ;
Tian, Qi .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12452-12461
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]  
Dosovitskiy A., 2020, INT C LEARN REPR, P1
[9]  
Ganin Y., 2016, JMLR, V17, P2096, DOI 10.48550/arXiv.1505.07818
[10]  
Grvalet Y., 2004, Advances in neural information processing systems, P529