ssFPN: Scale Sequence (S2) Feature-Based Feature Pyramid Network for Object Detection

被引:21
作者
Park, Hye-Jin [1 ]
Kang, Ji-Woo [1 ]
Kim, Byung-Gyu [1 ]
机构
[1] Sookmyung Womens Univ, Dept Artificial Intelligence Engn, 100 Chungpa Ro 47 Gil, Seoul 04310, South Korea
基金
新加坡国家研究基金会;
关键词
object detection; feature pyramid network; scale sequence (S-2) feature; convolutional neural network (CNN); deep learning; SUPERRESOLUTION; ARCHITECTURE;
D O I
10.3390/s23094432
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Object detection is a fundamental task in computer vision. Over the past several years, convolutional neural network (CNN)-based object detection models have significantly improved detection accuracyin terms of average precision (AP). Furthermore, feature pyramid networks (FPNs) are essential modules for object detection models to consider various object scales. However, the AP for small objects is lower than the AP for medium and large objects. It is difficult to recognize small objects because they do not have sufficient information, and information is lost in deeper CNN layers. This paper proposes a new FPN model named ssFPN (scale sequence (S2) feature-based feature pyramid network) to detect multi-scale objects, especially small objects. We propose a new scale sequence (S2) feature that is extracted by 3D convolution on the level of the FPN. It is defined and extracted from the FPN to strengthen the information on small objects based on scale-space theory. Motivated by this theory, the FPN is regarded as a scale space and extracts a scale sequence (S2) feature by three-dimensional convolution on the level axis of the FPN. The defined feature is basically scale-invariant and is built on a high-resolution pyramid feature map for small objects. Additionally, the deigned S2 feature can be extended to most object detection models based on FPNs. We also designed a feature-level super-resolution approach to show the efficiency of the scale sequence (S2) feature. We verified that the scale sequence (S2) feature could improve the classification accuracy for low-resolution images by training a feature-level super-resolution model. To demonstrate the effect of the scale sequence (S2) feature, experiments on the scale sequence (S2) feature built-in object detection approach including both one-stage and two-stage models were conducted on the MS COCO dataset. For the two-stage object detection models Faster R-CNN and Mask R-CNN with the S2 feature, AP improvements of up to 1.6% and 1.4%, respectively, were achieved. Additionally, the APS of each model was improved by 1.2% and 1.1%, respectively. Furthermore, the one-stage object detection models in the YOLO series were improved. For YOLOv4-P5, YOLOv4-P6, YOLOR-P6, YOLOR-W6, and YOLOR-D6 with the S2 feature, 0.9%, 0.5%, 0.5%, 0.1%, and 0.1% AP improvements were observed. For small object detection, the APS increased by 1.1%, 1.1%, 0.9%, 0.4%, and 0.1%, respectively. Experiments using the feature-level super-resolution approach with the proposed scale sequence (S2) feature were conducted on the CIFAR-100 dataset. By training the feature-level super-resolution model, we verified that ResNet-101 with the S2 feature trained on LR images achieved a 55.2% classification accuracy, which was 1.6% higher than for ResNet-101 trained on HR images.
引用
收藏
页数:19
相关论文
共 60 条
[1]  
Abu-El-Haija S., 2016, YouTube-8m: A Large-Scale Video Classification Benchmark
[2]   Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery [J].
Azimi, Seyed Majid ;
Vig, Eleonora ;
Bahmanyar, Reza ;
Koerner, Marco ;
Reinartz, Peter .
COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 :150-165
[3]   Toward fast and accurate human pose estimation via soft-gated skip connections [J].
Bulat, Adrian ;
Kossaifi, Jean ;
Tzimiropoulos, Georgios ;
Pantic, Maja .
2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, :8-15
[4]   Cascade R-CNN: Delving into High Quality Object Detection [J].
Cai, Zhaowei ;
Vasconcelos, Nuno .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162
[5]   Online multiple object tracking using joint detection and emb e dding network [J].
Chan, Sixian ;
Jia, Yangwei ;
Zhou, Xiaolong ;
Bai, Cong ;
Chen, Shengyong ;
Zhang, Xiaoqin .
PATTERN RECOGNITION, 2022, 130
[6]   Siamese Implicit Region Proposal Network With Compound Attention for Visual Tracking [J].
Chan, Sixian ;
Tao, Jian ;
Zhou, Xiaolong ;
Bai, Cong ;
Zhang, Xiaoqin .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :1882-1894
[7]   Attention to Scale: Scale-aware Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Yang, Yi ;
Wang, Jiang ;
Xu, Wei ;
Yuille, Alan L. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3640-3649
[8]   Group-based bi-directional recurrent wavelet neural network for efficient video super-resolution (VSR) [J].
Choi, Young-Ju ;
Lee, Young-Woon ;
Kim, Byung-Gyu .
PATTERN RECOGNITION LETTERS, 2022, 164 :246-253
[9]   Wavelet Attention Embedding Networks for Video Super-Resolution [J].
Choi, Young-Ju ;
Lee, Young-Woon ;
Kim, Byung-Gyu .
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :7314-7320
[10]   Residual-Based Graph Convolutional Network for Emotion Recognition in Conversation for Smart Internet of Things [J].
Choi, Young-Ju ;
Lee, Young-Woon ;
Kim, Byung-Gyu .
BIG DATA, 2021, 9 (04) :279-288