Current detection networks usually struggle to detect small-scale object instances due to spatial information loss and lack of semantics. In this paper, we propose a one-stage detector named LocalNet, which pays specific attention to the detailed information modeling. LocalNet is built upon our redesigned detection-oriented backbone called long neck ResNet, which aims to preserve more detailed information in the early stage to enhance the representation of small objects. Furthermore, to enhance the semantics in the detection layers, we propose a local detail-context module, which reintroduces the detailed information lost in the network and exploits the local context within a restricted receptive field range. Moreover, we explore a method for training detectors nearly or totally from scratch, which provides the potential to design network structures with more freedom. With nearly 94%\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$94\%$$\end{document} of the pretrained parameters randomly reinitialized in the backbone, our model improves the mAP of our baseline model from 75.0 to 82.3%\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$82.3\%$$\end{document} on the PASCAL VOC dataset with an input size of 300×300\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$300\times 300$$\end{document} and achieves state-of-the-art accuracy. Even when trained from scratch, our model achieves 80.8%\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$80.8\%$$\end{document} mAP, which is 5.8%\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$5.8\%$$\end{document} greater than the mAP of our baseline model with a fully pretrained backbone.