Spatiotemporal decoupling attention transformer for 3D skeleton-based driver action recognition

被引：0

作者：

Xu, Zhuoyan ^{[1
]}

Xu, Jingke ^{[1
,2
,3
]}

机构：

[1] Shenyang Jianzhu Univ, Sch Comp Sci & Engn, Shenyang 110168, Liaoning, Peoples R China

[2] Liaoning Prov Big Data Management & Anal Lab Urban, Shenyang 110168, Liaoning, Peoples R China

[3] Natl Special Comp Engn Technol Res Ctr, Shenyang Branch, Shenyang 110168, Peoples R China

来源：

COMPLEX & INTELLIGENT SYSTEMS | 2025年 / 11卷 / 04期

关键词：

In-vehicle scenarios; Autonomous Driving; Driver action recognition; Action recognition; Skeleton-based;

D O I：

10.1007/s40747-025-01811-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Driver action recognition is crucial for in-vehicle safety. We argue that the following factors limit the related research. First, spatial constraints and obstructions in the vehicle restrict the range of motion, resulting in similar action patterns and difficulty collecting the full body posture. Second, in skeleton-based action recognition, establishing the joint dependencies by the self-attention computation is always limited to a single frame, ignoring the effect of body spatial structure on dependence weights and inter-frame. Common convolution in temporal flow only focuses on frame-level temporal features, ignoring motion pattern features at a higher semantic level. Our work proposed a novel spatiotemporal decoupling attention transformer (SDA-TR). The SDA module uses a spatiotemporal decoupling strategy to decouple the weight computation according to body structure and directly establish joint dependencies between multiple frames. The TFA module aggregates sub-action-level and frame-level temporal features to improve similar recognition accuracy. On the Driver Action Recognition dataset Drive&Act using driver upper body skeletons, SDA-TR achieves state-of-the-art performance. SDA-TR also achieved 92.2%/95.8% accuracy under the CS/CV benchmarks of NTU RGB+D 60, 88.6%/89.8% accuracy under the CS/CSet benchmarks of NTU RGB+D 120, on par with other state-of-the-art methods. Our method demonstrates great scalability and generalization for action recognition.

引用

页数：12

共 50 条

[1] STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition [J].

Ahn, Dasom ;

Kim, Sangwon ;

Hong, Hyunsu ;

Ko, Byoung Chul .

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :3319-3328

[2] Action Recognition with Dynamic Image Networks [J].

Bilen, Hakan ;

Fernando, Basura ;

Gavves, Efstratios ;

Vedaldi, Andrea .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (12) :2799-2813

[3] Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition [J].

Chen, Tailin ;

Zhou, Desen ;

Wang, Jian ;

Wang, Shidong ;

Guan, Yu ;

He, Xuming ;

Ding, Errui .

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :4334-4342

[4] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition [J].

Chen, Yuxin ;

Zhang, Ziqi ;

Yuan, Chunfeng ;

Li, Bing ;

Deng, Ying ;

Hu, Weiming .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13339-13348

[5] Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].

Cheng, Ke ;

Zhang, Yifan ;

He, Xiangyu ;

Chen, Weihan ;

Cheng, Jian ;

Lu, Hanqing .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189

[6] Driver crash risk factors and prevalence evaluation using naturalistic driving data [J].

Dingus, Thomas A. ;

Guo, Feng ;

Lee, Suzie ;

Antin, Jonathan F. ;

Perez, Miguel ;

Buchanan-King, Mindy ;

Hankey, Jonathan .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (10) :2636-2641

[7]

Duan H, 2022, P 30 ACM INT C MULT

[8] Unified Pose Sequence Modeling [J].

Foo, Lin Geng ;

Li, Tianjiao ;

Rahmani, Hossein ;

Ke, Qiuhong ;

Liu, Jun .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :13019-13030

[9] Relation-mining self-attention network for skeleton-based human action recognition [J].

Gedamu, Kumie ;

Ji, Yanli ;

Gao, LingLing ;

Yang, Yang ;

Shen, Heng Tao .

PATTERN RECOGNITION, 2023, 139

[10] Abnormal Driving Detection With Normalized Driving Behavior Data: A Deep Learning Approach [J].

Hu, Jie ;

Zhang, Xiaoqin ;

Maybank, Steve .

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (07) :6943-6951

← 1 2 3 4 5 →