Soft Spatial Attention-Based Multimodal Driver Action Recognition Using Deep Learning

被引：34

作者：

Jegham, Imen ^{[1
]}

Ben Khalifa, Anouar ^{[2
]}

Alouani, Ihsen ^{[3
]}

Mahjoub, Mohamed Ali ^{[2
]}

机构：

[1] Univ Sousse, Inst Super Informat & Tech Commun H Sousse, LATIS, Sousse 4011, Tunisia

[2] Univ Sousse, Ecole Natl Ingenieurs Sousse, LATIS, Sousse 4023, Tunisia

[3] Univ Polytech Hauts De France, IEMN DOAE, F-59300 Valenciennes, France

来源：

IEEE SENSORS JOURNAL | 2021年 / 21卷 / 02期

关键词：

Vehicles; Sensors; Visualization; Computational modeling; Machine learning; Monitoring; Task analysis; Driver action recognition; kinect sensor; spatial soft attention; multimodal; deep learning; CLASSIFICATION; NETWORKS; KINECT;

D O I：

10.1109/JSEN.2020.3019258

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Driver behaviors and decisions are crucial factors for on-road driving safety. With a precise driver behavior monitoring system, traffic accidents and injuries can be significantly reduced. However, understanding human behaviors in real-world driving settings is a challenging task because of the uncontrolled conditions including illumination variation, occlusion, and dynamic and cluttered background. In this paper, a Kinect sensor, which provides multimodal signals, is adopted as a driver monitoring sensor to recognize safe driving and common secondary most distracting in-vehicle actions. We propose a novel soft spatial attention-based network named the Depth-based Spatial Attention network (DSA), which adds a cognitive process to deep network by selectively focusing on the driver's silhouette and motion in the cluttered driving scene. In fact, at each time t, we introduce a new weighted RGB frame based on an attention model designed using a depth frame. The final classification accuracy is substantially enhanced compared to the state-of-the-art results with an achieved improvement of up to 27%.

引用

页码：1918 / 1925

页数：8

共 47 条

[1] Baradel Fabien., 2018, Proc. Brit. Mach. Vis. Conf, P1
[2] Pedestrian detection using a moving camera: A novel framework for foreground detection
Ben Khalifa, Anouar
Alouani, Ihsen
Mahjoub, Mohamed Ali
Ben Amara, Najoua Essoukri
[J]. COGNITIVE SYSTEMS RESEARCH, 2020, 60 : 77 - 96
[3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[4] Cheng Yong., 2019, Agreement-Based Joint Training for Bidirectional Attention-Based Neural Machine Translation, P11
[5] A Survey on Activity Detection and Classification Using Wearable Sensors
Cornacchia, Maria
Ozcan, Koray
Zheng, Yu
Velipasalar, Senem
[J]. IEEE SENSORS JOURNAL, 2017, 17 (02) : 386 - 403
[6] Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
[7] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[8] Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos
Du, Wenbin
Wang, Yali
Qiao, Yu
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) : 1347 - 1360
[9] Girdhar, 2017, ADV NEURAL INFORM PR, P34
[10] Glorot X., 2010, Proceedings of the thirteenth international conference on artificial intelligence and statistics, P249, DOI DOI 10.1109/LGRS.2016.2565705

← 1 2 3 4 5 →