HFMIDet: Hierarchical Feature Fusion-Guided Multidimensional Infrared Pedestrian Detection Network

被引：0

作者：

Liu, Yang ^{[1
]}

Zhang, Ming ^{[1
]}

Fan, Fei ^{[1
]}

Yu, Dahua ^{[1
]}

Li, Jianjun ^{[1
]}

机构：

[1] Inner Mongolia Univ Sci & Technol, Sch Digital & Intelligent Ind, Baotou 014010, Peoples R China

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2025年 / 74卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Frequency-domain analysis; Pedestrians; Discrete cosine transforms; Head; Training; Data mining; Adaptation models; Transformers; Noise reduction; Denoising detection head; frequency-domain feature; hierarchical feature fusion; infrared images; pedestrian detection;

D O I：

10.1109/TIM.2025.3573003

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Infrared images are favored for their exceptional anti-interference capabilities. However, challenges such as low resolution and a scarcity of detailed textures can impede the effective recognition of multiscale object information in infrared imaging. To address these issues, we designed the hierarchical feature fusion-guided multidimensional infrared pedestrian detection network (HFMIDet), focusing on extracting richer details and global features within complex scenes. First, we design a hierarchical feature fusion network (HFFNet), which uses a multiscale fusion module to achieve cross-layer feature combination and a multilevel information fusion module (MLIF) to achieve multilevel feature fusion, so as to enhance the ability of the model to perceive the target location. In addition, the frequency-spatial feature enhancement module (FSFEM) aims to effectively suppress background noise by combining frequency and spatial domain feature information so that the network can extract effective object shape and global frequency information even in complex backgrounds. Finally, we use the adaptive denoising transformer head (ADTH) for the final detection task, while Focal CIoU is used to perform the loss feedback task. Experimental results on three public datasets of infrared pedestrians show that our model can achieve superior performance and outperform many state-of-the-art methods.

引用

页数：13

共 49 条

[1] TIDE: A General Toolbox for Identifying Object Detection Errors [J].

Bolya, Daniel ;

Foley, Sean ;

Hays, James ;

Hoffman, Judy .

COMPUTER VISION - ECCV 2020, PT III, 2020, 12348 :558-573

[2]

Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401

[3] Dynamic ReLU [J].

Chen, Yinpeng ;

Dai, Xiyang ;

Liu, Mengchen ;

Chen, Dongdong ;

Yuan, Lu ;

Liu, Zicheng .

COMPUTER VISION - ECCV 2020, PT XIX, 2020, 12364 :351-367

[4] Dynamic Head: Unifying Object Detection Heads with Attentions [J].

Dai, Xiyang ;

Chen, Yinpeng ;

Xiao, Bin ;

Chen, Dongdong ;

Liu, Mengchen ;

Yuan, Lu ;

Zhang, Lei .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7369-7378

[5] TIRNet: Object detection in thermal infrared images for autonomous driving [J].

Dai, Xuerui ;

Yuan, Xue ;

Wei, Xueye .

APPLIED INTELLIGENCE, 2021, 51 (03) :1244-1261

[6] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[7] Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery [J].

Devaguptapu, Chaitanya ;

Akolekar, Ninad ;

Sharma, Manuj M. ;

Balasubramanian, Vineeth N. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, :1029-1038

[8] Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [J].

Ding, Xiaohan ;

Zhang, Xiangyu ;

Han, Jungong ;

Ding, Guiguang .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11953-11965

[9]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[10]

FLIR, 2023, Dataset

← 1 2 3 4 5 →