Attention-based scale sequence network for small object detection

被引:2
作者
Lee, Young-Woon [1 ]
Kim, Byung-Gyu [2 ]
机构
[1] Sunmoon Univ, Dept Comp Engn, Asan, South Korea
[2] Sookmyung Womens Univ, Div Artificial Intelligence Engn, Seoul, South Korea
关键词
Small object detection; Feature pyramid network; Scale sequence; Attention mechanism; Deep learning;
D O I
10.1016/j.heliyon.2024.e32931
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recently, with the remarkable development of deep learning technology, achievements are being updated in various computer vision fields. In particular, the object recognition field is receiving the most attention. Nevertheless, recognition performance for small objects is still challenging. Its performance is of utmost importance in realistic applications such as searching for missing persons through aerial photography. The core structure of the object recognition neural network is the feature pyramid network (FPN). You Only Look Once (YOLO) is the most widely used representative model following this structure. In this study, we proposed an attention-based scale sequence network (ASSN) that improves the scale sequence feature pyramid network (ssFPN), enhancing the performance of the FPN-based detector for small objects. ASSN is a lightweight attention module optimized for FPN-based detectors and has the versatility to be applied to any model with a corresponding structure. The proposed ASSN demonstrated performance improvements compared to the baselines (YOLOv7 and YOLOv8) in average precision (AP) of up to 0.6%. Additionally, the AP for small objects (AP(s)) showed also improvements of up to 1.9%. Furthermore, ASSN exhibits higher performance than ssFPN while achieving lightweightness and optimization, thereby improving computational complexity and processing speed. ASSN is open-source based on YOLO version 7 and 8. This can be found in our public repository: https://github.com/smu-ivpl/ASSN.git
引用
收藏
页数:11
相关论文
共 43 条
[1]  
Azimi S.M., 2018, AS C COMP VIS
[2]  
Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934
[3]   Attention to Scale: Scale-aware Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Yang, Yi ;
Wang, Jiang ;
Xu, Wei ;
Yuille, Alan L. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3640-3649
[4]   Dynamic Head: Unifying Object Detection Heads with Attentions [J].
Dai, Xiyang ;
Chen, Yinpeng ;
Xiao, Bin ;
Chen, Dongdong ;
Liu, Mengchen ;
Yuan, Lu ;
Zhang, Lei .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7369-7378
[5]  
Everingham M., The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results
[6]   NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection [J].
Ghiasi, Golnaz ;
Lin, Tsung-Yi ;
Le, Quoc V. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7029-7038
[7]  
Girshick R., FAST R CNN, DOI [DOI 10.1109/ICCV.2015.169, 10.1109/ICCV.2015.169]
[8]   Rich feature hierarchies for accurate object detection and semantic segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587
[9]  
He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778