Action Recognition Based on 3D Skeleton and RGB Frame Fusion

被引：0

作者：

Liu, Guiyu ^{[1
]}

Qian, Jiuchao ^{[1
]}

Wen, Fei ^{[1
]}

Zhu, Xiaoguang ^{[1
]}

Ying, Rendong ^{[1
]}

Liu, Peilin ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China

来源：

2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2019年

关键词：

SEGMENTATION;

D O I：

10.1109/iros40897.2019.8967570

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Action recognition has wide applications in assisted living, health monitoring, surveillance, and human-computer interaction. In traditional action recognition methods, RGB video-based ones are effective but computationally inefficient, while skeleton-based ones are computationally efficient but do not make use of low-level detail information. This work considers action recognition based on a multimodal fusion between the 3D skeleton and the RGB image. We design a neural network that uses a 3D skeleton sequence and a single middle frame from an RGB video as input. Specifically, our method picks up one frame in a video and extracts spatial features from it using two attention modules, a self-attention module and a skeleton-attention module. Further, temporal features are extracted from the skeleton sequence via a BI-LSTM sub-network. Finally, the spatial features and the temporal features are combined via a feature fusion network for action classification. A distinct feature of our method is that it uses only a single RGB frame rather than an RGB video. Accordingly, it has a light-weighted architecture and is more efficient than RGB video-based methods. Comparative evaluation on two public datasets, NTU-RGBD and SYSU, demonstrates that, our method can achieve competitive performance compared with state-of-the-art methods.

引用

页码：258 / 264

页数：7

共 50 条

[31] ACTION RECOGNITION USING JOINT COORDINATES OF 3D SKELETON DATA
Batabyal, Tamal
Chattopadhyay, Tanushyam
Mukherjee, Dipti Prasad
2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 4107 - 4111
[32] Modeling the skeleton-language uncertainty for 3D action recognition
Wang, Mingdao
Zhang, Xianlin
Chen, Siqi
Li, Xueming
Zhang, Yue
NEUROCOMPUTING, 2024, 608
[33] Learning hierarchical 3D kernel descriptors for RGB-D action recognition
Kong, Yu
Satarboroujeni, Behnam
Fu, Yun
COMPUTER VISION AND IMAGE UNDERSTANDING, 2016, 144 : 14 - 23
[34] Enhancing Robustness of Viewpoint Changes in 3D Skeleton-Based Human Action Recognition
Park, Jinyoon
Kim, Chulwoong
Kim, Seung-Chan
MATHEMATICS, 2023, 11 (15)
[35] 3D HUMAN ACTION RECOGNITION BASED ON THE SPATIAL-TEMPORAL MOVING SKELETON DESCRIPTOR
Yao, Hongxian
Jiang, Xinghao
Sun, Tanfeng
Wang, Shilin
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 937 - 942
[36] Spatiotemporal decoupling attention transformer for 3D skeleton-based driver action recognition
Xu, Zhuoyan
Xu, Jingke
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)
[37] Human skeleton representation for 3D action recognition based on complex network coding and LSTM
Shen, Xiangpei
Ding, Yanrui
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 82
[38] Rethinking the ST-GCNs for 3D skeleton-based human action recognition
Peng, Wei
Shi, Jingang
Varanka, Tuomas
Zhao, Guoying
NEUROCOMPUTING, 2021, 454 : 45 - 53
[39] Deep Learning-Based Action Recognition Using 3D Skeleton Joints Information
Tasnim, Nusrat
Islam, Md. Mahbubul
Baek, Joong-Hwan
INVENTIONS, 2020, 5 (03) : 1 - 15
[40] Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints
Caetano, Carlos
Bremond, Francois
Schwartz, William Robson
2019 32ND SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 2019, : 16 - 23

← 1 2 3 4 5 →