HF-YOLO: Advanced Pedestrian Detection Model with Feature Fusion and Imbalance Resolution

被引：13

作者：

Pan, Lihu ^{[1
]}

Diao, Jianzhong ^{[1
]}

Wang, Zhengkui ^{[2
]}

Peng, Shouxin ^{[1
]}

Zhao, Cunhui ^{[3
]}

机构：

[1] Taiyuan Univ Sci & Technol, Sch Comp Sci & Technol, 63 Waliu Rd, Taiyuan 030024, Shanxi, Peoples R China

[2] Singapore Inst Technol, ICT Cluster, 10 Dover Dr, Singapore City 139651, Singapore

[3] Jingying Shuzhi Technol Co Ltd, 103 Changzhi Rd, Taiyuan 030012, Shanxi, Peoples R China

来源：

NEURAL PROCESSING LETTERS | 2024年 / 56卷 / 02期

关键词：

Pedestrian detection; Object detection; Activation function; YOLO; Loss function;

D O I：

10.1007/s11063-024-11558-4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pedestrian detection is crucial for various applications, including intelligent transportation and video surveillance systems. Although recent research has advanced pedestrian detection models like the YOLO series, they still face limitations in handling diverse pedestrian scales, leading to performance challenges. To address these issues, we propose HF-YOLO, an advanced pedestrian detection model. HF-YOLO tackles the complexities of pedestrian detection in complex scenes by addressing scale variations and occlusions among pedestrians. In the feature fusion stage, our algorithm leverages both shallow localization information and deep semantic information. This involves fusing P2 layer features and adding a high-resolution detection layer, significantly improving the detection of small-scale pedestrians and occluded instances. To enhance feature representation, HF-YOLO incorporates the HardSwish activation function, introducing more non-linear factors and strengthening the model's ability to represent complex and discriminative features. Additionally, to address regression imbalance, a balance factor is introduced to the CIoU loss function. This modification effectively resolves the imbalance problem and enhances pedestrian localization accuracy. Experimental results demonstrate the effectiveness of our proposed algorithm. HF-YOLO achieves notable improvements, including a 3.52% increase in average precision, a 1.35% boost in accuracy, and a 4.83% enhancement in recall. Moreover, the algorithm maintains real-time performance with a detection time of 8.5ms, meeting the stringent requirements of real-time applications.

引用

页数：20

共 47 条

[1] D2Det: Towards High Quality Object Detection and Instance Segmentation [J].

Cao, Jiale ;

Cholakkal, Hisham ;

Anwer, Rao Muhammad ;

Khan, Fahad Shahbaz ;

Pang, Yanwei ;

Shao, Ling .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11482-11491

[2] Detection in Crowded Scenes: One Proposal, Multiple Predictions [J].

Chu, Xuangeng ;

Zheng, Anlin ;

Zhang, Xiangyu ;

Sun, Jian .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12211-12220

[3] Multi-task faster R-CNN for nighttime pedestrian detection and distance estimation [J].

Dai, Xiaobiao ;

Hu, Junping ;

Zhang, Hongmei ;

Shitu, Abubakar ;

Luo, Chunlei ;

Osman, Ahmad ;

Sfarra, Stefano ;

Duan, Yuxia .

INFRARED PHYSICS & TECHNOLOGY, 2021, 115

[4] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[5] Pedestrian Support in Intelligent Transportation Systems: Challenges, Solutions and Open issues [J].

El Hamdani, Sara ;

Benamar, Nabil ;

Younis, Mohamed .

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2020, 121

[6] Sigmoid-weighted linear units for neural network function approximation in reinforcement learning [J].

Elfwing, Stefan ;

Uchibe, Eiji ;

Doya, Kenji .

NEURAL NETWORKS, 2018, 107 :3-11

[7] Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification [J].

Feng, Jiawei ;

Wu, Ancong ;

Zhen, Wei-Shi .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :22752-22761

[8] Res2Net: A New Multi-Scale Backbone Architecture [J].

Gao, Shang-Hua ;

Cheng, Ming-Ming ;

Zhao, Kai ;

Zhang, Xin-Yu ;

Yang, Ming-Hsuan ;

Torr, Philip .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) :652-662

[9]

Gevorgyan Z, 2022, Arxiv, DOI arXiv:2205.12740

[10] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

← 1 2 3 4 5 →