Multi-Scale Residual Aggregation Feature Pyramid Network for Object Detection

被引:5
作者
Wang, Hongyang [1 ]
Wang, Tiejun [2 ]
机构
[1] Northwest Minzu Univ, Key Lab Chinas Ethn Languages & Informat Technol, Minist Educ, Lanzhou 730000, Peoples R China
[2] Northwest Minzu Univ, Sch Math & Comp Sci, Lanzhou 730000, Peoples R China
基金
中国国家自然科学基金;
关键词
object detection; feature pyramid network; Thangka image;
D O I
10.3390/electronics12010093
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The effective use of multi-scale features remains an open problem for object detection tasks. Recently, proposed object detectors have usually used Feature Pyramid Networks (FPN) to fuse multi-scale features. Since Feature Pyramid Networks use a relatively simple feature map fusion approach, it can lead to the loss or misalignment of semantic information in the fusion process. Several works have demonstrated that using a bottom-up structure in a Feature Pyramid Network can shorten the information path between lower layers and the topmost feature, allowing an adequate exchange of semantic information from different layers. We further enhance the bottom-up path by proposing a multi-scale residual aggregation Feature Pyramid Network (MSRA-FPN), which uses a unidirectional cross-layer residual module to aggregate features from multiple layers bottom-up in a triangular structure to the topmost layer. In addition, we introduce a Residual Squeeze and Excitation Module to mitigate the aliasing effects that occur when features from different layers are aggregated. MSRA-FPN enhances the semantic information of the high-level feature maps, mitigates the information decay during feature fusion, and enhances the detection capability of the model for large objects. It is experimentally demonstrated that our proposed MSRA-FPN improves the performance of the three baseline models by 0.5-1.9% on the PASCAL VOC dataset and is also quite competitive with other state-of-the-art FPN methods. On the MS COCO dataset, our proposed method can also improve the performance of the baseline model by 0.8% and the baseline model's performance for large object detection by 1.8%. To further validate the effectiveness of MSRA-FPN for large object detection, we constructed the Thangka Figure Dataset and conducted comparative experiments. It is experimentally demonstrated that our proposed method improves the performance of the baseline model by 2.9-4.7% on this dataset and can reach up to 71.2%.
引用
收藏
页数:18
相关论文
共 40 条
[21]   CE-FPN: enhancing channel information for object detection [J].
Luo, Yihao ;
Cao, Xiang ;
Zhang, Juntao ;
Guo, Jingjuan ;
Shen, Haibo ;
Wang, Tianjiang ;
Feng, Qi .
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (21) :30685-30704
[22]   Libra R-CNN: Towards Balanced Learning for Object Detection [J].
Pang, Jiangmiao ;
Chen, Kai ;
Shi, Jianping ;
Feng, Huajun ;
Ouyang, Wanli ;
Lin, Dahua .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :821-830
[23]   Pyramid Attention Upsampling Module for Object Detection [J].
Park, Hyeokjin ;
Paik, Joonki .
IEEE ACCESS, 2022, 10 :38742-38749
[24]   You Only Look Once: Unified, Real-Time Object Detection [J].
Redmon, Joseph ;
Divvala, Santosh ;
Girshick, Ross ;
Farhadi, Ali .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :779-788
[25]   Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].
Ren, Shaoqing ;
He, Kaiming ;
Girshick, Ross ;
Sun, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149
[26]   Sparse R-CNN: End-to-End Object Detection with Learnable Proposals [J].
Sun, Peize ;
Zhang, Rufeng ;
Jiang, Yi ;
Kong, Tao ;
Xu, Chenfeng ;
Zhan, Wei ;
Tomizuka, Masayoshi ;
Li, Lei ;
Yuan, Zehuan ;
Wang, Changhu ;
Luo, Ping .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :14449-14458
[27]   EfficientDet: Scalable and Efficient Object Detection [J].
Tan, Mingxing ;
Pang, Ruoming ;
Le, Quoc, V .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10778-10787
[28]   FCOS: Fully Convolutional One-Stage Object Detection [J].
Tian, Zhi ;
Shen, Chunhua ;
Chen, Hao ;
He, Tong .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9626-9635
[29]   Focal Loss for Dense Object Detection [J].
Lin, Tsung-Yi ;
Goyal, Priya ;
Girshick, Ross ;
He, Kaiming ;
Dollar, Piotr .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2999-3007
[30]  
Vaswani A, 2017, ADV NEUR IN, V30