Attention Based Multi-Layer Fusion of Multispectral Images for Pedestrian Detection

被引:45
作者
Zhang, Yongtao [1 ]
Yin, Zhishuai [1 ,2 ]
Nie, Linzhen [1 ,2 ]
Huang, Song [1 ]
机构
[1] Wuhan Univ Technol, Sch Automot Engn, Wuhan 430070, Peoples R China
[2] Wuhan Univ Technol, Hubei Key Lab Adv Technol Automot Components, Wuhan 430070, Peoples R China
基金
美国国家科学基金会;
关键词
Feature extraction; Proposals; Fuses; Streaming media; Detectors; Convolutional neural networks; Saliency detection; pedestrian detection; image fusion; deep learning; DEEP NEURAL-NETWORKS; CNN;
D O I
10.1109/ACCESS.2020.3022623
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multispectral images are increasingly used for pedestrian detection. Preliminary fusion strategies would fail to exploit informative features from cross-spectral images, or worse, may introduce additional interference. In this paper, we propose an attention based multi-layer fusion network in the triple-stream deep convolutional neural network architecture for multispectral pedestrian detection. The effectiveness of multi-layer fusion is examined and verified in this work. Furthermore, a channel-wise attention module (CAM) and a spatial-wise attention module (SAM) are developed and incorporated into the network aiming at more subtle adjustment to weights of multispectral features along both the channel and spatial dimensions respectively. Channel-wise attention is trained with self-supervision while spatial-wise attention is trained with external supervision as we remodel its learning process as saliency detection. Both attention-based weighting mechanisms are evaluated separately and then sequentially. Experimental results on the KAIST dataset show that the proposed multi-layer cross-spectral fusion R-CNN (CS-RCNN), with spatial-wise weighting applied alone, achieves state-of-the-art performance on all-day detection while outperforming compared methods at nighttime.
引用
收藏
页码:165071 / 165084
页数:14
相关论文
共 55 条
[1]  
[Anonymous], 2015, COMPUTER SCI
[2]  
Ba J., 2015, P INT C LEARN REPR
[3]   Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [J].
Bell, Sean ;
Zitnick, C. Lawrence ;
Bala, Kavita ;
Girshick, Ross .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2874-2883
[4]   Attention Augmented Convolutional Networks [J].
Bello, Irwan ;
Zoph, Barret ;
Vaswani, Ashish ;
Shlens, Jonathon ;
Le, Quoc V. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294
[5]   Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection [J].
Cao, Yanpeng ;
Guan, Dayan ;
Wu, Yulun ;
Yang, Jiangxin ;
Cao, Yanlong ;
Yang, Michael Ying .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 150 :70-79
[6]  
Chen Y., 2017, Advances in Neural Information Processing Systems, P4470
[7]   Multi-layer fusion techniques using a CNN for multispectral pedestrian detection [J].
Chen, Yunfan ;
Xie, Han ;
Shin, Hyunchul .
IET COMPUTER VISION, 2018, 12 (08) :1179-1187
[8]   Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation [J].
Cheng, Yanhua ;
Cai, Rui ;
Li, Zhiwei ;
Zhao, Xin ;
Huang, Kaiqi .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1475-1483
[9]   KAIST Multi-Spectral Day/Night Data Set for Autonomous and Assisted Driving [J].
Choi, Yukyung ;
Kim, Namil ;
Hwang, Soonmin ;
Park, Kibaek ;
Yoon, Jae Shin ;
An, Kyounghwan ;
Kweon, In So .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2018, 19 (03) :934-948
[10]  
Choi Y, 2016, 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), P223, DOI 10.1109/IROS.2016.7759059