Unsupervised Cross-domain Object Detection Based on Progressive Multi-source Transfer

被引:0
作者
Li W. [1 ,2 ]
Wang M. [1 ,2 ]
机构
[1] School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming
[2] Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming
来源
Zidonghua Xuebao/Acta Automatica Sinica | 2022年 / 48卷 / 09期
基金
中国国家自然科学基金;
关键词
domain adaptation; multi-source domain; object detection; self training; Transfer learning;
D O I
10.16383/j.aas.c190532
中图分类号
学科分类号
摘要
To address the difficulty of collecting manually labeled training samples for object detection tasks, this paper proposes an unsupervised cross-domain object detection method that gradually adapts the model at pixel level and feature level. The existing pixel-level domain adaptive methods generate translated images with a single style and inconsistent content structure. To solve this problem, this paper embeds the input images into domain-invariant content space and domain-specific attribute space, then cooperates different space representations to synthesize diverse translated images that preserve the spatial semantic information to enable label transfer. In addition, for feature-level domain adaptation, to alleviate the source-bias problem caused by single source domain, we treat the generated diverse labeled images as source domain data and design a multi-domain discriminator to get multi-domain-invariant representations. Finally, To further enhance the detection performance on the target domain, we propose a self-training framework to alternatively generate pseudo labels on target training data. The exploratory experiment results from the Cityscapes & Foggy Cityscapes dataset and VOC07 & Clipart1k dataset demonstrate that compared with the current unsupervised cross-domain detection methods, the proposed detection framework achieves better transferability. © 2022 Science Press. All rights reserved.
引用
收藏
页码:2337 / 2351
页数:14
相关论文
共 37 条
[1]  
Liu L, Ouyang W L, Wang X A, Paul W. F, Jie C, Liu X W, Et al., Deep learning for generic object detection: A survey, (2018)
[2]  
Zhang Hui, Wang Kun-Feng, Wang Fei-Yue, Advances andperspectives on applications of deep learning in visual objectdetection, Acta Automatica Sinica, 43, 8, pp. 1289-1305, (2017)
[3]  
Krizhevsky A, Sutskever I, Hinton G E., ImageNet classification with deep convolutional neural networks, Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS), pp. 1097-1105, (2012)
[4]  
Girshick R, Donahue J, Darrell T, Malik J., Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580-587, (2014)
[5]  
Girshick R., Fast R-CNN, Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440-1448, (2015)
[6]  
Ren S Q, He K M, Girshick R, Sun J., Faster R-CNN: Towards real-time object detection with region proposal networks, Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS), pp. 91-99, (2015)
[7]  
Redmon J, Farhadi A., YOLOv3: An incremental improvement, (2018)
[8]  
Liu W, Anguelov D, Erhan D, Szegedy C, Reed E, Fu C Y, Et al., SSD: Single shot multi-box detector, Proceedings of the 14th European Conference on Computer Vision (ECCV), pp. 21-37, (2016)
[9]  
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler N, Benenson R, Et al., The Cityscapes dataset for semantic urban scene understanding, Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213-3223, (2016)
[10]  
Sakaridis C, Dai D X, Gool L V., Semantic foggy scene understanding with synthetic data, International Journal of Computer Vision, 126, 9, pp. 973-992, (2018)