Few-Shot Object Detection via Dual-Domain Feature Fusion and Patch-Level Attention

被引：0

作者：

Ren, Guangli ^{[1
,2
]}

Liu, Jierui ^{[1
,2
]}

Wang, Mengyao ^{[1
,2
]}

Guan, Peiyu ^{[1
,2
]}

Cao, Zhiqiang ^{[1
,2
]}

Yu, Junzhi ^{[3
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

[3] Peking Univ, Dept Adv Mfg & Robot, Beijing 100871, Peoples R China

来源：

TSINGHUA SCIENCE AND TECHNOLOGY | 2025年 / 30卷 / 03期

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

Training; Visualization; Adaptation models; Head; Accuracy; Diversity reception; Object detection; Feature extraction; few-shot object detection; dual-domain feature fusion; patch-level attention; CHALLENGES;

D O I：

10.26599/TST.2024.9010031

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Few-shot object detection receives much attention with the ability to detect novel class objects using limited annotated data. The transfer learning-based solution becomes popular due to its simple training with good accuracy, however, it is still challenging to enrich the feature diversity during the training process. And fine-grained features are also insufficient for novel class detection. To deal with the problems, this paper proposes a novel few-shot object detection method based on dual-domain feature fusion and patch-level attention. Upon original base domain, an elementary domain with more category-agnostic features is superposed to construct a two-stream backbone, which benefits to enrich the feature diversity. To better integrate various features, a dual-domain feature fusion is designed, where the feature pairs with the same size are complementarily fused to extract more discriminative features. Moreover, a patch-wise feature refinement termed as patch-level attention is presented to mine internal relations among the patches, which enhances the adaptability to novel classes. In addition, a weighted classification loss is given to assist the fine-tuning of the classifier by combining extra features from FPN of the base training model. In this way, the few-shot detection quality to novel class objects is improved. Experiments on PASCAL VOC and MS COCO datasets verify the effectiveness of the method.

引用

页码：1237 / 1250

页数：14

共 39 条

[1]

Andrychowicz M, 2016, ADV NEUR IN, V29

[2]

Chen WY, 2020, Arxiv, DOI arXiv:1904.04232

[3]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[4] The Pascal Visual Object Classes (VOC) Challenge [J].

Everingham, Mark ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338

[5] Dual Attention Network for Scene Segmentation [J].

Fu, Jun ;

Liu, Jing ;

Tian, Haijie ;

Li, Yong ;

Bao, Yongjun ;

Fang, Zhiwei ;

Lu, Hanqing .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149

[6] Grasp Detection with Hierarchical Multi-Scale Feature Fusion and Inverted Shuffle Residual [J].

Geng, Wenjie ;

Cao, Zhiqiang ;

Guan, Peiyu ;

Jing, Fengshui ;

Tan, Min ;

Yu, Junzhi .

TSINGHUA SCIENCE AND TECHNOLOGY, 2024, 29 (01) :244-256

[7] CFA: Constraint-based Finetuning Approach for Generalized Few-Shot Object Detection [J].

Guirguis, Karim ;

Hendawy, Ahmed ;

Eskandar, George ;

Abdelsamad, Mohamed ;

Kayser, Matthias ;

Beyerer, Juergen .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, :4048-4058

[8] HSPOG: An Optimized Target Recognition Method Based on Histogram of Spatial Pyramid Oriented Gradients [J].

Guo, Shaojun ;

Liu, Feng ;

Yuan, Xiaohu ;

Zou, Chunrong ;

Chen, Li ;

Shen, Tongsheng .

TSINGHUA SCIENCE AND TECHNOLOGY, 2021, 26 (04) :475-483

[9] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[10] Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection [J].

Hu, Hanzhe ;

Bai, Shuai ;

Li, Aoxue ;

Cui, Jinshi ;

Wang, Liwei .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :10180-10189

← 1 2 3 4 →