Multi-Scale Structure Perception and Global Context-Aware Method for Small-Scale Pedestrian Detection

被引:1
作者
Gao, Hao [1 ]
Huang, Shucheng [1 ]
Li, Mingxing [2 ]
Li, Tian [3 ]
机构
[1] Jiangsu Univ Sci & Technol, Sch Comp, Zhenjiang 212003, Peoples R China
[2] Jiangsu Univ, Jingjiang Coll, Zhenjiang 212013, Peoples R China
[3] Jiangsu Univ Sci & Technol, Suzhou Inst Technol, Suzhou 215699, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Pedestrians; Feature extraction; Detectors; Transformers; YOLO; Semantics; Proposals; Context modeling; Detection algorithms; Identification of persons; Context information; self-attention; small-scale pedestrian detection; Transformer; OBJECT DETECTION; NETWORK; NMS;
D O I
10.1109/ACCESS.2024.3406968
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In pedestrian detection, small-scale pedestrians often face challenges such as limited pixel values and insufficient features, often leading to wrong or missed detection. Therefore, this paper proposed a multi-scale structure perception and global context-aware method for small-scale pedestrian detection. Firstly, to address the issue of decreasing features caused by the network deepens, we designed a feature fusion strategy to overcome the constraints of the feature pyramid hierarchy. This strategy combines deep and shallow feature maps and leverages the advantages of Transformer to capture long-distance dependent features, incorporating a global context information module to retain a substantial amount of small-scale pedestrian features. Secondly, considering the confusion between small-scale pedestrian features and background information, we employed a combination of self-attention modules and channel attention modules to jointly model the spatial and channel correlations of feature maps. This utilization of small-scale pedestrian context and channel information enhances small-scale pedestrian features while suppressing background information. Finally, to address the issue of gradient explosion during model training, we introduced a novel weighted loss function named ES-IoU, which significantly improved the convergence speed. Extensive experimental results on the CityPersons and CrowdHuman datasets demonstrate that the proposed method achieves a substantial improvement upon state-of-the-art methods.
引用
收藏
页码:76392 / 76403
页数:12
相关论文
共 60 条
[1]  
Bochkovskiy A., 2020, ARXIV, DOI [10.48550/ARXIV.2004.10934, 10.48550/arXiv.2004.10934]
[2]   Soft-NMS - Improving Object Detection With One Line of Code [J].
Bodla, Navaneeth ;
Singh, Bharat ;
Chellappa, Rama ;
Davis, Larry S. .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5562-5570
[3]   Cascade R-CNN: Delving into High Quality Object Detection [J].
Cai, Zhaowei ;
Vasconcelos, Nuno .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162
[4]   A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection [J].
Cai, Zhaowei ;
Fan, Quanfu ;
Feris, Rogerio S. ;
Vasconcelos, Nuno .
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :354-370
[5]   Multi-feature Fusion Pedestrian Detection Combining Head and Overall Information [J].
Chen Yong ;
Xie Wenyang ;
Liu Huanlin ;
Wang Bo ;
Huang Meiyong .
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (04) :1453-1460
[6]   Learning a Dynamic High-Resolution Network for Multi-Scale Pedestrian Detection [J].
Ding, Mengyuan ;
Zhang, Shanshan ;
Yang, Jian .
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :9076-9082
[7]   Pedestrian Detection: An Evaluation of the State of the Art [J].
Dollar, Piotr ;
Wojek, Christian ;
Schiele, Bernt ;
Perona, Pietro .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (04) :743-761
[8]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[9]  
Ge Z, 2021, arXiv
[10]  
Hosang J, 2015, PROC CVPR IEEE, P4073, DOI 10.1109/CVPR.2015.7299034