Domain adaptation with temporal ensembling to local attention region search for object detection

被引:1
作者
Shi, Haobin [1 ]
He, Ziming [1 ]
Hwang, Kao-Shing [1 ,2 ,3 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China
[2] Natl Sun Yat sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan
[3] Kaohsiung Med Univ, Dept Healthcare Adm & Med Informat, Kaohsiung 80708, Taiwan
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Reinforcement learning; Object detection; Domain adaptation; Temporal ensembling; Attention mechanism; Medical imaging;
D O I
10.1016/j.knosys.2024.112846
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detection relies heavily on supervised learning, which requires labeled data for training. However, manual labeling often cannot keep pace with the speed of data collection, and models trained on one dataset may not generalize well to new datasets with different characteristics, leading to domain shift issues. Domain adaptation addresses this problem by leveraging labeled data from a source domain and unlabeled data from a target domain to improve performance on the target domain. Limited by the existing domain adaption architecture, the object detection accuracy in the target domain has much room for improvement. In addition, the global search of feature maps costs too much computation. All these problems make it difficult for domain adaptive object detection to be directly applied to tasks such as medical imaging. To this end, this article proposes two architectures: Region-based Object Detection with Domain Adaptation and Temporal Ensembling (DATE) and Local Attention Region Search Algorithm (LARSA). DATE combines domain adaptation and temporal ensembling to enhance feature alignment between domains. At the same time, LARSA employs an attention mechanism to efficiently search for regions of interest and decide when to terminate the search early. Experiments on various datasets demonstrate the effectiveness of the proposed approaches in improving object detection performance under domain shift and reducing computational cost. The proposed framework has the potential to further promote the application of object detection in the field of medical imaging.
引用
收藏
页数:12
相关论文
共 53 条
[1]   WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians [J].
Bernal, Jorge ;
Javier Sanchez, F. ;
Fernandez-Esparrach, Gloria ;
Gil, Debora ;
Rodriguez, Cristina ;
Vilarino, Fernando .
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2015, 43 :99-111
[2]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[3]  
Chen XL, 2017, Arxiv, DOI arXiv:1702.02138
[4]   Domain Adaptive Faster R-CNN for Object Detection in the Wild [J].
Chen, Yuhua ;
Li, Wen ;
Sakaridis, Christos ;
Dai, Dengxin ;
Van Gool, Luc .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3339-3348
[5]   Transfer Learning for Multicenter Classification of Chronic Obstructive Pulmonary Disease [J].
Cheplygina, Veronika ;
Pena, Isabel Pino ;
Pedersen, Jesper Holst ;
Lynch, David A. ;
Sorensen, Lauge ;
de Bruijne, Marleen .
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2018, 22 (05) :1486-1496
[6]  
Chetwani S. A., 2022, 2022 13 INT C COMP C, P1
[7]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[8]   Unbiased Mean Teacher for Cross-domain Object Detection [J].
Deng, Jinhong ;
Li, Wen ;
Chen, Yuhua ;
Duan, Lixin .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4089-4099
[9]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[10]  
Ganin Y, 2015, PR MACH LEARN RES, V37, P1180