Multi-Scale Structure Perception and Global Context-Aware Method for Small-Scale Pedestrian Detection

被引：1

作者：

Gao, Hao ^{[1
]}

Huang, Shucheng ^{[1
]}

Li, Mingxing ^{[2
]}

Li, Tian ^{[3
]}

机构：

[1] Jiangsu Univ Sci & Technol, Sch Comp, Zhenjiang 212003, Peoples R China

[2] Jiangsu Univ, Jingjiang Coll, Zhenjiang 212013, Peoples R China

[3] Jiangsu Univ Sci & Technol, Suzhou Inst Technol, Suzhou 215699, Jiangsu, Peoples R China

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

中国国家自然科学基金;

关键词：

Pedestrians; Feature extraction; Detectors; Transformers; YOLO; Semantics; Proposals; Context modeling; Detection algorithms; Identification of persons; Context information; self-attention; small-scale pedestrian detection; Transformer; OBJECT DETECTION; NETWORK; NMS;

D O I：

10.1109/ACCESS.2024.3406968

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In pedestrian detection, small-scale pedestrians often face challenges such as limited pixel values and insufficient features, often leading to wrong or missed detection. Therefore, this paper proposed a multi-scale structure perception and global context-aware method for small-scale pedestrian detection. Firstly, to address the issue of decreasing features caused by the network deepens, we designed a feature fusion strategy to overcome the constraints of the feature pyramid hierarchy. This strategy combines deep and shallow feature maps and leverages the advantages of Transformer to capture long-distance dependent features, incorporating a global context information module to retain a substantial amount of small-scale pedestrian features. Secondly, considering the confusion between small-scale pedestrian features and background information, we employed a combination of self-attention modules and channel attention modules to jointly model the spatial and channel correlations of feature maps. This utilization of small-scale pedestrian context and channel information enhances small-scale pedestrian features while suppressing background information. Finally, to address the issue of gradient explosion during model training, we introduced a novel weighted loss function named ES-IoU, which significantly improved the convergence speed. Extensive experimental results on the CityPersons and CrowdHuman datasets demonstrate that the proposed method achieves a substantial improvement upon state-of-the-art methods.

引用

页码：76392 / 76403

页数：12

共 60 条

[1]

Bochkovskiy A., 2020, ARXIV, DOI [10.48550/ARXIV.2004.10934, 10.48550/arXiv.2004.10934]

[2] Soft-NMS - Improving Object Detection With One Line of Code [J].

Bodla, Navaneeth ;

Singh, Bharat ;

Chellappa, Rama ;

Davis, Larry S. .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5562-5570

[3] Cascade R-CNN: Delving into High Quality Object Detection [J].

Cai, Zhaowei ;

Vasconcelos, Nuno .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162

[4] A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection [J].

Cai, Zhaowei ;

Fan, Quanfu ;

Feris, Rogerio S. ;

Vasconcelos, Nuno .

COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :354-370

[5] Multi-feature Fusion Pedestrian Detection Combining Head and Overall Information [J].

Chen Yong ;

Xie Wenyang ;

Liu Huanlin ;

Wang Bo ;

Huang Meiyong .

JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (04) :1453-1460

[6] Learning a Dynamic High-Resolution Network for Multi-Scale Pedestrian Detection [J].

Ding, Mengyuan ;

Zhang, Shanshan ;

Yang, Jian .

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :9076-9082

[7] Pedestrian Detection: An Evaluation of the State of the Art [J].

Dollar, Piotr ;

Wojek, Christian ;

Schiele, Bernt ;

Perona, Pietro .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (04) :743-761

[8]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[9]

Ge Z, 2021, arXiv

[10]

Hosang J, 2015, PROC CVPR IEEE, P4073, DOI 10.1109/CVPR.2015.7299034

← 1 2 3 4 5 6 →