Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks

被引：0

作者：

Li, Tao ^{[1
]}

Xiong, Wenjun ^{[2
]}

Zhang, Zheng ^{[2
]}

Pei, Lishen ^{[3
]}

机构：

[1] Open Univ Henan, Dept Informat Engn, Zhengzhou 450046, Peoples R China

[2] Open Univ Henan, Resource Construct & Management Ctr, Zhengzhou 450046, Peoples R China

[3] Henan Univ Econ & Law, Dept Informat Engn, Zhengzhou 450046, Peoples R China

来源：

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE | 2023年

基金：

中国国家自然科学基金;

关键词：

Video action recognition; graph convolutional networks; spatial-temporal graphs; feature combination;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video action recognition relies heavily on the way spatio-temporal cues are combined in order to enhance recognition accuracy. This issue can be addressed with explicit modeling of interactions among objects within or between videos, such as the graph neural network, which has been shown to accurately model and represent complicated spatial- temporal object relations for video action classification. However, the visual objects in the video are diversified, whereas the nodes in the graphs are fixed. This may result in information overload or loss if the visual objects are too redundant or insufficient for graph construction. Segment level graph convolutional networks (SLGCNs) are proposed as a method for recognizing actions in videos. The SLGCN consists of a segment-level spatial graph and a segment-level temporal graph, both of which are capable of simultaneously processing spatial and temporal information. Specifically, the segment-level spatial graph and the segment-level temporal graph are constructed using 2D and 3D CNNs to extract appearance and motion features from video segments. Graph convolutions are applied in order to obtain informative segment-level spatial-temporal features. A variety of challenging video datasets, such as EPIC-Kitchens, FCVID, HMDB51 and UCF101, are used to evaluate our method. In experiments, it is demonstrated that the SLGCN can achieve performance comparable to the state-of-the-art models in terms of obtaining spatial-temporal features.

引用

页数：24

共 50 条

[1] Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks
Li, Tao
Xiong, Wenjun
Zhang, Zheng
Pei, Lishen
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023,
[2] Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition
de Amorim, Cleison Correia
Macedo, David
Zanchettin, Cleber
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 : 646 - 657
[3] Spatial-Temporal Graph Convolutional Framework for Yoga Action Recognition and Grading
Wang, Shu
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[4] Action Recognition Based on Spatial Temporal Graph Convolutional Networks
Zheng, Wanqiang
Jing, Punan
Xu, Qingyang
PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
[5] Sparse Spatial-Temporal Emotion Graph Convolutional Network for Video Emotion Recognition
Liu, Xiaodong
Xu, Huating
Wang, Miao
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[6] Smoking Action Recognition Based on Spatial-Temporal Convolutional Neural Networks
Chiu, Chien-Fang
Kuo, Chien-Hao
Chang, Pao-Chi
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1616 - 1619
[7] Spatial-Temporal Self-Attention Enhanced Graph Convolutional Networks for Fitness Yoga Action Recognition
Wei, Guixiang
Zhou, Huijian
Zhang, Liping
Wang, Jianji
SENSORS, 2023, 23 (10)
[8] Using BlazePose on Spatial Temporal Graph Convolutional Networks for Action Recognition
Alsawadi, Motasem S.
El-Kenawy, El-Sayed M.
Rio, Miguel
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (01): : 19 - 36
[9] Attention module-based spatial-temporal graph convolutional networks for skeleton-based action recognition
Kong, Yinghui
Li, Li
Zhang, Ke
Ni, Qiang
Han, Jungong
JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (04)
[10] Human Action Recognition by Fusion of Convolutional Neural Networks and spatial-temporal Information
Li, Weisheng
Ding, Yahui
8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 255 - 259

← 1 2 3 4 5 →