Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection

被引：1

作者：

Zhiwei Yan

Huicheng Zheng

Ye Li

Lvran Chen

机构：

[1] Sun Yat-sen University,School of Computer Science and Engineering

[2] Ministry of Education,Key Laboratory of Machine Intelligence and Advanced Computing

[3] Guangdong Province Key Laboratory of Information Security Technology,undefined

[4] Healthcare Security Bureau of Shenzhen Municipality,undefined

来源：

Neural Processing Letters | 2021年 / 53卷

关键词：

Small object detection; Detection backbone; Local feature representation; Receptive field;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Current detection networks usually struggle to detect small-scale object instances due to spatial information loss and lack of semantics. In this paper, we propose a one-stage detector named LocalNet, which pays specific attention to the detailed information modeling. LocalNet is built upon our redesigned detection-oriented backbone called long neck ResNet, which aims to preserve more detailed information in the early stage to enhance the representation of small objects. Furthermore, to enhance the semantics in the detection layers, we propose a local detail-context module, which reintroduces the detailed information lost in the network and exploits the local context within a restricted receptive field range. Moreover, we explore a method for training detectors nearly or totally from scratch, which provides the potential to design network structures with more freedom. With nearly 94%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94\%$$\end{document} of the pretrained parameters randomly reinitialized in the backbone, our model improves the mAP of our baseline model from 75.0 to 82.3%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$82.3\%$$\end{document} on the PASCAL VOC dataset with an input size of 300×300\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$300\times 300$$\end{document} and achieves state-of-the-art accuracy. Even when trained from scratch, our model achieves 80.8%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$80.8\%$$\end{document} mAP, which is 5.8%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5.8\%$$\end{document} greater than the mAP of our baseline model with a fully pretrained backbone.

引用

页码：1921 / 1943

页数：22

共 76 条

[1] Chen C(2019)Adaptive convolution for object detection IEEE Trans Multimedia 21 3205-3217
[2] Ling Q(2018)Object detection based on multi-layer convolution feature fusion and online hard example mining IEEE Access 6 19959-19967
[3] Chu J(2018)Deep feature based contextual model for object detection Neurocomputing 275 1035-1042
[4] Guo Z(2010)The Pascal visual object classes (VOC) challenge Int J Comput Vis 88 303-338
[5] Leng L(2019)Using multi-label classification to improve object detection Neurocomputing 370 174-185
[6] Chu W(2019)Multimodal face-pose estimation with multitask manifold deep learning IEEE Trans Ind Inform 15 3952-3961
[7] Cai D(2018)Multistage object detection with group recursive learning IEEE Trans Multimedia 20 1645-1655
[8] Everingham M(2019)Detail preservation and feature refinement for object detection Neurocomputing 359 209-218
[9] Van Gool L(2017)Faster R-CNN: Towards real-time object detection with region proposal networks IEEE Trans Pattern Anal Mach Intell 39 1137-1149
[10] Williams CK(2019)Object detection from scratch with deep supervision IEEE Trans Pattern Anal Mach Intell 42 398-412

← 1 2 3 4 5 6 7 8 →