Adaptive feature fusion with attention mechanism for multi-scale target detection

被引:32
|
作者
Ju, Moran [1 ,2 ,3 ,4 ,5 ]
Luo, Jiangning [6 ]
Wang, Zhongbo [1 ,2 ,3 ,4 ,5 ]
Luo, Haibo [1 ,2 ,4 ,5 ]
机构
[1] Chinese Acad Sci, Shenyang Inst Automat, Shenyang 110016, Liaoning, Peoples R China
[2] Chinese Acad Sci, Inst Robot & Intelligent Mfg, Shenyang 110016, Liaoning, Peoples R China
[3] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[4] Chinese Acad Sci, Key Lab Opt Elect Informat Proc, Shenyang 110016, Liaoning, Peoples R China
[5] Key Lab Image Understanding & Comp Vis, Shenyang 110016, Liaoning, Peoples R China
[6] McGill Univ, Montreal, PQ H3A 0G4, Canada
关键词
Deep learning; Target detection; Adaptive feature fusion; Attention mechanism; RECOGNITION;
D O I
10.1007/s00521-020-05150-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To detect the targets of different sizes, multi-scale output is used by target detectors such as YOLO V3 and DSSD. To improve the detection performance, YOLO V3 and DSSD perform feature fusion by combining two adjacent scales. However, the feature fusion only between the adjacent scales is not sufficient. It hasn't made advantage of the features at other scales. What is more, as a common operation for feature fusion, concatenating can't provide a mechanism to learn the importance and correlation of the features at different scales. In this paper, we propose adaptive feature fusion with attention mechanism (AFFAM) for multi-scale target detection. AFFAM utilizes pathway layer and subpixel convolution layer to resize the feature maps, which is helpful to learn better and complex feature mapping. In addition, AFFAM utilizes global attention mechanism and spatial position attention mechanism, respectively, to learn the correlation of the channel features and the importance of the spatial features at different scales adaptively. Finally, we combine AFFAM with YOLO V3 to build an efficient multi-scale target detector. The comparative experiments are conducted on PASCAL VOC dataset, KITTI dataset and Smart UVM dataset. Compared with the state-of-the-art target detectors, YOLO V3 with AFFAM achieved 84.34% mean average precision (mAP) at 19.9 FPS on PASCAL VOC dataset, 87.2% mAP at 21 FPS on KITTI dataset and 99.22% mAP at 20.6 FPS on Smart UVM dataset which outperforms other advanced target detectors.
引用
收藏
页码:2769 / 2781
页数:13
相关论文
共 50 条
  • [31] Ship Detection in SAR Images Based on Multi-Scale Feature Extraction and Adaptive Feature Fusion
    Zhou, Kexue
    Zhang, Min
    Wang, Hai
    Tan, Jinlin
    REMOTE SENSING, 2022, 14 (03)
  • [32] Remote sensing image target detection combining multi-scale and attention mechanism
    Zhang Y.-Z.
    Guo W.
    Cai Z.-Q.
    Li W.-B.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2022, 56 (11): : 2215 - 2223
  • [33] Multi-Scale Feature Extraction Method of Hyperspectral Image with Attention Mechanism
    Xu Zhangchi
    Guo Baofeng
    Wu Wenhao
    You Jingyun
    Su Xiaotong
    LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (04)
  • [34] Multi-scale Vertical Cross-layer Feature Aggregation and Attention Fusion Network for Object Detection
    Gao, Wenting
    Li, Xiaojuan
    Han, Yu
    Liu, Yue
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 139 - 150
  • [35] Binocular Depth Estimation Algorithm Based on Multi-Scale Attention Feature Fusion
    Yang Huitong
    Lei Lang
    Lin Yongchun
    LASER & OPTOELECTRONICS PROGRESS, 2022, 59 (18)
  • [36] Spatial Small Target Detection Method Based on Multi-Scale Feature Fusion Pyramid
    Wang, Xiaojuan
    Liu, Yuepeng
    Xu, Haitao
    Xue, Changbin
    APPLIED SCIENCES-BASEL, 2024, 14 (13):
  • [37] Deep Multi-Scale Feature Fusion Target Detection Algorithm Based on Deep Learning
    Liu Xin
    Chen Siyi
    Chen Xiaolong
    Du Xinhao
    LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (12)
  • [38] Robust coverless image steganography based on DenseUNet with multi-scale feature fusion and attention mechanism
    Li, Xiaopeng
    Zhang, Qiuyu
    Li, Zhe
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (11) : 8251 - 8266
  • [39] Multi-scale Convolutional Feature Fusion Network Based on Attention Mechanism for IoT Traffic Classification
    Niandong Liao
    Jiayu Guan
    International Journal of Computational Intelligence Systems, 17
  • [40] Dynamically Adaptive Deformable Feature Fusion for multi-scale character detection in ancient documents
    Bermudez-Gonzalez, Mauricio
    Jalali, Amin
    Lee, Minho
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139