Attention Based Multi-Layer Fusion of Multispectral Images for Pedestrian Detection

被引：45

作者：

Zhang, Yongtao ^{[1
]}

Yin, Zhishuai ^{[1
,2
]}

Nie, Linzhen ^{[1
,2
]}

Huang, Song ^{[1
]}

机构：

[1] Wuhan Univ Technol, Sch Automot Engn, Wuhan 430070, Peoples R China

[2] Wuhan Univ Technol, Hubei Key Lab Adv Technol Automot Components, Wuhan 430070, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

基金：

美国国家科学基金会;

关键词：

Feature extraction; Proposals; Fuses; Streaming media; Detectors; Convolutional neural networks; Saliency detection; pedestrian detection; image fusion; deep learning; DEEP NEURAL-NETWORKS; CNN;

D O I：

10.1109/ACCESS.2020.3022623

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multispectral images are increasingly used for pedestrian detection. Preliminary fusion strategies would fail to exploit informative features from cross-spectral images, or worse, may introduce additional interference. In this paper, we propose an attention based multi-layer fusion network in the triple-stream deep convolutional neural network architecture for multispectral pedestrian detection. The effectiveness of multi-layer fusion is examined and verified in this work. Furthermore, a channel-wise attention module (CAM) and a spatial-wise attention module (SAM) are developed and incorporated into the network aiming at more subtle adjustment to weights of multispectral features along both the channel and spatial dimensions respectively. Channel-wise attention is trained with self-supervision while spatial-wise attention is trained with external supervision as we remodel its learning process as saliency detection. Both attention-based weighting mechanisms are evaluated separately and then sequentially. Experimental results on the KAIST dataset show that the proposed multi-layer cross-spectral fusion R-CNN (CS-RCNN), with spatial-wise weighting applied alone, achieves state-of-the-art performance on all-day detection while outperforming compared methods at nighttime.

引用

页码：165071 / 165084

页数：14

共 55 条

[1]

[Anonymous], 2015, COMPUTER SCI

[2]

Ba J., 2015, P INT C LEARN REPR

[3] Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [J].

Bell, Sean ;

Zitnick, C. Lawrence ;

Bala, Kavita ;

Girshick, Ross .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2874-2883

[4] Attention Augmented Convolutional Networks [J].

Bello, Irwan ;

Zoph, Barret ;

Vaswani, Ashish ;

Shlens, Jonathon ;

Le, Quoc V. .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294

[5] Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection [J].

Cao, Yanpeng ;

Guan, Dayan ;

Wu, Yulun ;

Yang, Jiangxin ;

Cao, Yanlong ;

Yang, Michael Ying .

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 150 :70-79

[6]

Chen Y., 2017, Advances in Neural Information Processing Systems, P4470

[7] Multi-layer fusion techniques using a CNN for multispectral pedestrian detection [J].

Chen, Yunfan ;

Xie, Han ;

Shin, Hyunchul .

IET COMPUTER VISION, 2018, 12 (08) :1179-1187

[8] Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation [J].

Cheng, Yanhua ;

Cai, Rui ;

Li, Zhiwei ;

Zhao, Xin ;

Huang, Kaiqi .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1475-1483

[9] KAIST Multi-Spectral Day/Night Data Set for Autonomous and Assisted Driving [J].

Choi, Yukyung ;

Kim, Namil ;

Hwang, Soonmin ;

Park, Kibaek ;

Yoon, Jae Shin ;

An, Kyounghwan ;

Kweon, In So .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2018, 19 (03) :934-948

[10]

Choi Y, 2016, 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), P223, DOI 10.1109/IROS.2016.7759059

← 1 2 3 4 5 6 →