Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition

被引:50
作者
Huang, Zhen [1 ]
Shen, Xu [2 ]
Tian, Xinmei [1 ]
Li, Houqiang [1 ]
Huang, Jianqiang [2 ]
Hua, Xian-Sheng [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Alibaba Grp, Shenzhen, Peoples R China
来源
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年
基金
中国国家自然科学基金;
关键词
graph convolutional networks; skeleton-based classification; FORM;
D O I
10.1145/3394171.3413666
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skeleton-based human action recognition has attracted much attention with the prevalence of accessible depth sensors. Recently, graph convolutional networks (GCNs) have been widely used for this task due to their powerful capability to model graph data. The topology of the adjacency graph is a key factor for modeling the correlations of the input skeletons. Thus, previous methods mainly focus on the design/learning of the graph topology. But once the topology is learned, only a single-scale feature and one transformation exist in each layer of the networks. Many insights, such as multi-scale information and multiple sets of transformations, that have been proven to be very effective in convolutional neural networks (CNNs), have not been investigated in GCNs. The reason is that, due to the gap between graph-structured skeleton data and conventional image/video data, it is very challenging to embed these insights into GCNs. To overcome this gap, we reinvent the split-transform-merge strategy in GCNs for skeleton sequence processing. Specifically, we design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition. Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths. Extensive experiments demonstrate that our network outperforms state-of-the-art methods by a significant margin with only 1/5 of the parameters and 1/10 of the FLOPs.
引用
收藏
页码:2122 / 2130
页数:9
相关论文
共 55 条
  • [1] [Anonymous], APPL COMPUT HARMON A
  • [2] [Anonymous], 2015, PROCIEEE CONFCOMPUT, DOI DOI 10.1109/CVPR.2015.7298594
  • [3] [Anonymous], 2017, CVPR, DOI DOI 10.1109/CVPR.2017.486
  • [4] Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)
  • [5] Canto LF, 2013, SCATTERING THEORY OF MOLECULES, ATOMS AND NUCLEI, P3
  • [6] Skeleton-Based Action Recognition With Gated Convolutional Neural Networks
    Cao, Congqi
    Lan, Cuiling
    Zhang, Yifan
    Zeng, Wenjun
    Lu, Hanqing
    Zhang, Yanning
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (11) : 3247 - 3257
  • [7] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
    Cao, Zhe
    Simon, Tomas
    Wei, Shih-En
    Sheikh, Yaser
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1302 - 1310
  • [8] SPATIAL AND TEMPORAL CONTRAST SENSITIVITIES OF NEURONS IN LATERAL GENICULATE-NUCLEUS OF MACAQUE
    DERRINGTON, AM
    LENNIE, P
    [J]. JOURNAL OF PHYSIOLOGY-LONDON, 1984, 357 (DEC): : 219 - 240
  • [9] Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
  • [10] Duvenaudt D, 2015, ADV NEUR IN, V28