Adaptive feature fusion with attention mechanism for multi-scale target detection

被引：32

作者：

Ju, Moran ^{[1
,2
,3
,4
,5
]}

Luo, Jiangning ^{[6
]}

Wang, Zhongbo ^{[1
,2
,3
,4
,5
]}

Luo, Haibo ^{[1
,2
,4
,5
]}

机构：

[1] Chinese Acad Sci, Shenyang Inst Automat, Shenyang 110016, Liaoning, Peoples R China

[2] Chinese Acad Sci, Inst Robot & Intelligent Mfg, Shenyang 110016, Liaoning, Peoples R China

[3] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[4] Chinese Acad Sci, Key Lab Opt Elect Informat Proc, Shenyang 110016, Liaoning, Peoples R China

[5] Key Lab Image Understanding & Comp Vis, Shenyang 110016, Liaoning, Peoples R China

[6] McGill Univ, Montreal, PQ H3A 0G4, Canada

来源：

NEURAL COMPUTING & APPLICATIONS | 2021年 / 33卷 / 07期

关键词：

Deep learning; Target detection; Adaptive feature fusion; Attention mechanism; RECOGNITION;

D O I：

10.1007/s00521-020-05150-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To detect the targets of different sizes, multi-scale output is used by target detectors such as YOLO V3 and DSSD. To improve the detection performance, YOLO V3 and DSSD perform feature fusion by combining two adjacent scales. However, the feature fusion only between the adjacent scales is not sufficient. It hasn't made advantage of the features at other scales. What is more, as a common operation for feature fusion, concatenating can't provide a mechanism to learn the importance and correlation of the features at different scales. In this paper, we propose adaptive feature fusion with attention mechanism (AFFAM) for multi-scale target detection. AFFAM utilizes pathway layer and subpixel convolution layer to resize the feature maps, which is helpful to learn better and complex feature mapping. In addition, AFFAM utilizes global attention mechanism and spatial position attention mechanism, respectively, to learn the correlation of the channel features and the importance of the spatial features at different scales adaptively. Finally, we combine AFFAM with YOLO V3 to build an efficient multi-scale target detector. The comparative experiments are conducted on PASCAL VOC dataset, KITTI dataset and Smart UVM dataset. Compared with the state-of-the-art target detectors, YOLO V3 with AFFAM achieved 84.34% mean average precision (mAP) at 19.9 FPS on PASCAL VOC dataset, 87.2% mAP at 21 FPS on KITTI dataset and 99.22% mAP at 20.6 FPS on Smart UVM dataset which outperforms other advanced target detectors.

引用

页码：2769 / 2781

页数：13

共 50 条

[31] Ship Detection in SAR Images Based on Multi-Scale Feature Extraction and Adaptive Feature Fusion
Zhou, Kexue
Zhang, Min
Wang, Hai
Tan, Jinlin
REMOTE SENSING, 2022, 14 (03)
[32] Remote sensing image target detection combining multi-scale and attention mechanism
Zhang Y.-Z.
Guo W.
Cai Z.-Q.
Li W.-B.
Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2022, 56 (11): : 2215 - 2223
[33] Multi-Scale Feature Extraction Method of Hyperspectral Image with Attention Mechanism
Xu Zhangchi
Guo Baofeng
Wu Wenhao
You Jingyun
Su Xiaotong
LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (04)
[34] Multi-scale Vertical Cross-layer Feature Aggregation and Attention Fusion Network for Object Detection
Gao, Wenting
Li, Xiaojuan
Han, Yu
Liu, Yue
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 139 - 150
[35] Binocular Depth Estimation Algorithm Based on Multi-Scale Attention Feature Fusion
Yang Huitong
Lei Lang
Lin Yongchun
LASER & OPTOELECTRONICS PROGRESS, 2022, 59 (18)
[36] Spatial Small Target Detection Method Based on Multi-Scale Feature Fusion Pyramid
Wang, Xiaojuan
Liu, Yuepeng
Xu, Haitao
Xue, Changbin
APPLIED SCIENCES-BASEL, 2024, 14 (13):
[37] Deep Multi-Scale Feature Fusion Target Detection Algorithm Based on Deep Learning
Liu Xin
Chen Siyi
Chen Xiaolong
Du Xinhao
LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (12)
[38] Robust coverless image steganography based on DenseUNet with multi-scale feature fusion and attention mechanism
Li, Xiaopeng
Zhang, Qiuyu
Li, Zhe
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (11) : 8251 - 8266
[39] Multi-scale Convolutional Feature Fusion Network Based on Attention Mechanism for IoT Traffic Classification
Niandong Liao
Jiayu Guan
International Journal of Computational Intelligence Systems, 17
[40] Dynamically Adaptive Deformable Feature Fusion for multi-scale character detection in ancient documents
Bermudez-Gonzalez, Mauricio
Jalali, Amin
Lee, Minho
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139

← 1 2 3 4 5 →