Skeleton-Based Human Action Recognition with A Physics-Augmented Encoder-Decoder Network

被引：0

作者：

Guo, Hongji ^{[1
]}

Aved, Alexander ^{[2
]}

Roller, Collen ^{[2
]}

Ardiles-Cruz, Erika ^{[2
]}

Ji, Qiang ^{[1
]}

机构：

[1] Rensselaer Polytech Inst, Troy, NY 12180 USA

[2] Air Force Res Lab, Rome, NY 13441 USA

来源：

GEOSPATIAL INFORMATICS XIII | 2023年 / 12525卷

关键词：

Skeleton-based action recognition; physics; encoder-decoder; ENSEMBLE;

D O I：

10.1117/12.2664115

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human action recognition is important for many applications such as surveillance monitoring, safety, and health-care. As 3D body skeletons can accurately characterize body actions and are robust to camera views, we propose a 3D skeleton-based human action method. Different from the existing skeleton-based methods that use only geometric features for action recognition, we propose a physics-augmented encoder and decoder model that produces physically plausible geometric features for human action recognition. Specifically, given the input skeleton sequence, the encoder performs a spatiotemporal graph convolution to produce spatiotemporal features for both predicting human actions and estimating the generalized positions and forces of body joints. The decoder, implemented as an ODE solver, takes the joint forces and solves the Euler-Lagrangian equation to reconstruct the skeletons in the next frame. By training the model to simultaneously minimize the action classification and the 3D skeleton reconstruction errors, the encoder is ensured to produce features that are consistent with both body skeletons and the underlying body dynamics as well as being discriminative. The physics-augmented spatiotemporal features are used for human action classification. We evaluate the proposed method on NTU-RGB+D, a large-scale dataset for skeleton-based action recognition. Compared with existing methods, our method achieves higher accuracy and better generalization ability.

引用

页数：10

共 56 条

[1]

Ali S, 2007, IEEE I CONF COMP VIS, P1703

[2] Skeleton-Based Action Recognition With Gated Convolutional Neural Networks [J].

Cao, Congqi ;

Lan, Cuiling ;

Zhang, Yifan ;

Zeng, Wenjun ;

Lu, Hanqing ;

Zhang, Yanning .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (11) :3247-3257

[3] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition [J].

Chen, Yuxin ;

Zhang, Ziqi ;

Yuan, Chunfeng ;

Li, Bing ;

Deng, Ying ;

Hu, Weiming .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13339-13348

[4]

Cheng K, 2020, Img Proc Comp Vis Re, V12369, P536, DOI 10.1007/978-3-030-58586-0_32

[5] Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].

Cheng, Ke ;

Zhang, Yifan ;

He, Xiangyu ;

Chen, Weihan ;

Cheng, Jian ;

Lu, Hanqing .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189

[6]

Cranmer M, 2020, Arxiv, DOI arXiv:2003.04630

[7]

Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714

[8]

Fernando B, 2015, PROC CVPR IEEE, P5378, DOI 10.1109/CVPR.2015.7299176

[9]

Greydanus S., 2019, Advances in neural information processing systems, V32

[10]

Hu HY, 2022, Arxiv, DOI arXiv:2209.10833

← 1 2 3 4 5 6 →