Deep Learning for Multi-scale Object Detection: A Survey

被引:0
作者
Chen K.-Q. [1 ,2 ,3 ]
Zhu Z.-L. [2 ,3 ,4 ]
Deng X.-M. [2 ,3 ]
Ma C.-X. [1 ,2 ,3 ]
Wang H.-A. [1 ,2 ,3 ]
机构
[1] School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing
[2] State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing
[3] Beijing Key Laboratory of Human-computer Interaction, Institute of Software, Chinese Academy of Sciences, Beijing
[4] School of Software, East China Jiaotong University, Nanchang
来源
Ruan Jian Xue Bao/Journal of Software | 2021年 / 32卷 / 04期
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Deep learning; Multi-scale feature; Object detection; Scale problem;
D O I
10.13328/j.cnki.jos.006166
中图分类号
学科分类号
摘要
Object detection is a classic computer vision task which aims to detect multiple objects of certain classes within a given image by bounding-box-level localization. With the rapid development of neural network technology and the birth of R-CNN detector as a milestone, a series of deep-learning-based object detectors have been developed in recent years, showing the overwhelming speed and accuracy advantage against traditional algorithms. However, how to precisely detect objects in large scale variance, also known as the scale problem, still remains a great challenge even for the deep learning methods, while many scholars have made several contributions to it over the last few years. Although there are already dozens of surveys focusing on the summarization of deep-learning-based object detectors in several aspects including algorithm procedure, network structure, training and datasets, very few of them concentrate on the methods of multi-scale object detection. Therefore, this paper firstly review the foundation of the deep-learning-based detectors in two main streams, including the two-stage detectors like R-CNN and one-stage detectors like YOLO and SSD. Then, the effective approaches are discussed to address the scale problems including most commonly used image pyramids, in-network feature pyramids, etc. At last, the current situations of the multi-scale object detection are concluded and the future research directions are looked ahead. © Copyright 2021, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:1201 / 1227
页数:26
相关论文
共 101 条
[1]  
Lowe DG., Distinctive image features from scale-invariant keypoints, Int'l Journal of Computer Vision, 60, 2, pp. 91-110, (2004)
[2]  
Dalal N, Triggs B., Histograms of oriented gradients for human detection, Proc. of the Computer Vision and Pattern Recognition, 1, pp. 886-893, (2005)
[3]  
Krizhevsky A, Sutskever I, Hinton GE., Imagenet classification with deep convolutional neural networks, Proc. of the Neural Information Processing Systems, pp. 1097-1105, (2012)
[4]  
Deng J, Dong W, Socher R, Li LJ, Li K, Li FF., Imagenet: A large-scale hierarchical image database, Proc. of the Computer Vision and Pattern Recognition, pp. 248-255, (2009)
[5]  
Simonyan K, Zisserman A., Very deep convolutional networks for large-scale image recognition, (2014)
[6]  
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A., Going deeper with convolutions, Proc. of the Computer Vision and Pattern Recognition, pp. 1-9, (2015)
[7]  
He K, Zhang X, Ren S, Sun J., Deep residual learning for image recognition, Proc. of the Computer Vision and Pattern Recognition, pp. 770-778, (2016)
[8]  
Girshick R, Donahue J, Darrell T, Malik J., Rich feature hierarchies for accurate object detection and semantic segmentation, Proc. of the Computer Vision and Pattern Recognition, pp. 580-587, (2014)
[9]  
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A., The pascal visual object classes (VoC) challenge, Int'l Journal of Computer Vision, 88, 2, pp. 303-338, (2010)
[10]  
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D., Object detection with discriminatively trained part-based models, IEEE Trans. on Pattern Analysis and Machine Intelligence, 32, 9, pp. 1627-1645, (2009)