MSSD: multi-scale object detector based on spatial pyramid depthwise convolution and efficient channel attention mechanism

被引：1

作者：

Zhou, Yipeng ^{[1
]}

Qian, Huaming ^{[1
]}

Ding, Peng ^{[1
]}

机构：

[1] Harbin Engn Univ, Coll Intelligent Syst Sci & Engn, Harbin 150001, Peoples R China

来源：

JOURNAL OF REAL-TIME IMAGE PROCESSING | 2023年 / 20卷 / 05期

基金：

中国国家自然科学基金;

关键词：

SPDC; ECAM; ECAM-FPN; MSSD;

D O I：

10.1007/s11554-023-01358-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object detection has made widespread development and remarkable progress in various fields, but, in complex application scenarios, often encounters the situation that the target features are inconspicuous and the scale range is large, making it incapable of achieving the desirable results, especially for small targets. This paper proposes a multi-scale object detector MSSD based on spatial pyramid depthwise convolution (SPDC) and efficient channel attention mechanism (ECAM) from the optimization of SSD. Firstly, use ResNet50 to replace VGG as backbone to obtain more representative features. Secondly, a plug-and-play spatial pyramid depthwise convolution module SPDC is proposed to enhance perceptual field and multi-scale feature extraction capabilities. Furthermore, we design a straightforward efficient channel attention mechanism (ECAM) to scale the weights of features on channels to derive more robust features. Finally, the feature pyramid network (FPN) with ECAM (ECAM-FPN) module is introduced in the prediction feature layer for deep feature fusion to obtain multi-scale features rich in semantic and detail information. For 300x300 input, MSSD achieves 82.5% mAP on PASCAL VOC07+12 dataset at 56 FPS and 48.2% mAP on MS COCO2017 dataset, which are 8.2% and 7.0% higher than SSD(300), respectively. Detection of small targets is improved by 0.8% on COCO and by 6.5% when scaled to 512x512. The proposed method has significant gains in cross-scale target detection while satisfying real time and is comparable with other methods.

引用

页数：10

共 36 条

[1]

Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, DOI 10.48550/ARXIV.2004.10934]

[2] Xception: Deep Learning with Depthwise Separable Convolutions [J].

Chollet, Francois .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807

[3] Corner Proposal Network for Anchor-Free, Two-Stage Object Detection [J].

Duan, Kaiwen ;

Xie, Lingxi ;

Qi, Honggang ;

Bai, Song ;

Huang, Qingming ;

Tian, Qi .

COMPUTER VISION - ECCV 2020, PT III, 2020, 12348 :399-416

[4]

Fu Cheng-Yang., 2017, arXiv

[5]

Howard AG, 2017, Arxiv, DOI [arXiv:1704.04861, DOI 10.48550/ARXIV.1704.04861]

[6] Rich feature hierarchies for accurate object detection and semantic segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587

[7]

He KM, 2015, Arxiv, DOI [arXiv:1512.03385, 10.48550/ARXIV.1512.03385]

[8] Coordinate Attention for Efficient Mobile Network Design [J].

Hou, Qibin ;

Zhou, Daquan ;

Feng, Jiashi .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13708-13717

[9]

Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]

[10] SSD-TSEFFM: New SSD Using Trident Feature and Squeeze and Extraction Feature Fusion [J].

Hwang, Young-Joon ;

Lee, Jin-Gu ;

Moon, Un-Chul ;

Park, Ho-Hyun .

SENSORS, 2020, 20 (13) :1-14

← 1 2 3 4 →