Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks

被引:0
|
作者
Li, Tao [1 ]
Xiong, Wenjun [2 ]
Zhang, Zheng [2 ]
Pei, Lishen [3 ]
机构
[1] Open Univ Henan, Dept Informat Engn, Zhengzhou 450046, Peoples R China
[2] Open Univ Henan, Resource Construct & Management Ctr, Zhengzhou 450046, Peoples R China
[3] Henan Univ Econ & Law, Dept Informat Engn, Zhengzhou 450046, Peoples R China
基金
中国国家自然科学基金;
关键词
Video action recognition; graph convolutional networks; spatial-temporal graphs; feature combination;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video action recognition relies heavily on the way spatio-temporal cues are combined in order to enhance recognition accuracy. This issue can be addressed with explicit modeling of interactions among objects within or between videos, such as the graph neural network, which has been shown to accurately model and represent complicated spatial- temporal object relations for video action classification. However, the visual objects in the video are diversified, whereas the nodes in the graphs are fixed. This may result in information overload or loss if the visual objects are too redundant or insufficient for graph construction. Segment level graph convolutional networks (SLGCNs) are proposed as a method for recognizing actions in videos. The SLGCN consists of a segment-level spatial graph and a segment-level temporal graph, both of which are capable of simultaneously processing spatial and temporal information. Specifically, the segment-level spatial graph and the segment-level temporal graph are constructed using 2D and 3D CNNs to extract appearance and motion features from video segments. Graph convolutions are applied in order to obtain informative segment-level spatial-temporal features. A variety of challenging video datasets, such as EPIC-Kitchens, FCVID, HMDB51 and UCF101, are used to evaluate our method. In experiments, it is demonstrated that the SLGCN can achieve performance comparable to the state-of-the-art models in terms of obtaining spatial-temporal features.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks
    Li, Tao
    Xiong, Wenjun
    Zhang, Zheng
    Pei, Lishen
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023,
  • [2] Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition
    de Amorim, Cleison Correia
    Macedo, David
    Zanchettin, Cleber
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 : 646 - 657
  • [3] Spatial-Temporal Graph Convolutional Framework for Yoga Action Recognition and Grading
    Wang, Shu
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [4] Action Recognition Based on Spatial Temporal Graph Convolutional Networks
    Zheng, Wanqiang
    Jing, Punan
    Xu, Qingyang
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
  • [5] Sparse Spatial-Temporal Emotion Graph Convolutional Network for Video Emotion Recognition
    Liu, Xiaodong
    Xu, Huating
    Wang, Miao
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [6] Smoking Action Recognition Based on Spatial-Temporal Convolutional Neural Networks
    Chiu, Chien-Fang
    Kuo, Chien-Hao
    Chang, Pao-Chi
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1616 - 1619
  • [7] Spatial-Temporal Self-Attention Enhanced Graph Convolutional Networks for Fitness Yoga Action Recognition
    Wei, Guixiang
    Zhou, Huijian
    Zhang, Liping
    Wang, Jianji
    SENSORS, 2023, 23 (10)
  • [8] Using BlazePose on Spatial Temporal Graph Convolutional Networks for Action Recognition
    Alsawadi, Motasem S.
    El-Kenawy, El-Sayed M.
    Rio, Miguel
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (01): : 19 - 36
  • [9] Attention module-based spatial-temporal graph convolutional networks for skeleton-based action recognition
    Kong, Yinghui
    Li, Li
    Zhang, Ke
    Ni, Qiang
    Han, Jungong
    JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (04)
  • [10] Human Action Recognition by Fusion of Convolutional Neural Networks and spatial-temporal Information
    Li, Weisheng
    Ding, Yahui
    8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 255 - 259