MSSD: multi-scale object detector based on spatial pyramid depthwise convolution and efficient channel attention mechanism

被引:0
作者
Yipeng Zhou
Huaming Qian
Peng Ding
机构
[1] Harbin Engineering University,College of Intelligent Systems Science and Engineering
来源
Journal of Real-Time Image Processing | 2023年 / 20卷
关键词
SPDC; ECAM; ECAM-FPN; MSSD;
D O I
暂无
中图分类号
学科分类号
摘要
Object detection has made widespread development and remarkable progress in various fields, but, in complex application scenarios, often encounters the situation that the target features are inconspicuous and the scale range is large, making it incapable of achieving the desirable results, especially for small targets. This paper proposes a multi-scale object detector MSSD based on spatial pyramid depthwise convolution (SPDC) and efficient channel attention mechanism (ECAM) from the optimization of SSD. Firstly, use ResNet50 to replace VGG as backbone to obtain more representative features. Secondly, a plug-and-play spatial pyramid depthwise convolution module SPDC is proposed to enhance perceptual field and multi-scale feature extraction capabilities. Furthermore, we design a straightforward efficient channel attention mechanism (ECAM) to scale the weights of features on channels to derive more robust features. Finally, the feature pyramid network (FPN) with ECAM (ECAM-FPN) module is introduced in the prediction feature layer for deep feature fusion to obtain multi-scale features rich in semantic and detail information. For 300×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}300 input, MSSD achieves 82.5%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} mAP on PASCAL VOC07+12 dataset at 56 FPS and 48.2%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} mAP on MS COCO2017 dataset, which are 8.2%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} and 7.0%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} higher than SSD(300), respectively. Detection of small targets is improved by 0.8%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} on COCO and by 6.5%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} when scaled to 512×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}512. The proposed method has significant gains in cross-scale target detection while satisfying real time and is comparable with other methods.
引用
收藏
相关论文
共 32 条
[1]  
Hu J(2020)Squeeze-and-excitation networks IEEE Trans. Pattern Anal. Mach. Intell. 42 2011-2023
[2]  
Shen L(2020)Ssd-tseffm: new ssd using trident feature and squeeze and extraction feature fusion Sensors 20 3630-27
[3]  
Albanie S(2021)Multi-object recognition method based on improved yolov2 model Inf. Technol. Control 50 13-1149
[4]  
Hwang YJ(2022)Tracking of a fixed-shape moving object based on the gradient descent method Sensors 22 1098-1496
[5]  
Lee JG(2023)Fessd: Ssd target detection based on feature fusion and feature enhancement J. Real-Time Image Process. 20 2-24357
[6]  
Moon UC(2017)Faster r-cnn: towards real-time object detection with region proposal networks IEEE Trans. Pattern Anal. Mach. Intell. 39 1137-498
[7]  
Li X(2021)Smoothed dilated convolutions for improved dense prediction Data Mining Knowl. Discov. 35 1470-undefined
[8]  
Shi B(2021)Fd-ssd: an improved ssd object detection algorithm based on feature fusion and dilated convolution Signal Process.: Image Commun. 98 402-undefined
[9]  
Nie T(2020)Df-ssd: an improved ssd object detection algorithm based on densenet and feature fusion IEEE Access 8 24344-undefined
[10]  
Masood H(2019)Multi-level features extraction for discontinuous target tracking in remote sensing image monitoring Sensors 19 4855-undefined