InfoGCN: Representation Learning for Human Skeleton-based Action Recognition

被引：226

作者：

Chi, Hyung-gun ^{[1
]}

Ha, Myoung Hoon ^{[2
]}

Chi, Seunggeun ^{[1
]}

Lee, Sang Wan ^{[2
]}

Huang, Qixing ^{[3
]}

Ramani, Karthik ^{[1
,4
]}

机构：

[1] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA

[2] Korea Adv Inst Sci & Technol, Daejeon, South Korea

[3] Univ Texas Austin, Austin, TX 78712 USA

[4] Purdue Univ, Sch Mech Engn, W Lafayette, IN 47907 USA

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

基金：

美国国家科学基金会; 新加坡国家研究基金会;

关键词：

D O I：

10.1109/CVPR52688.2022.01955

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human skeleton-based action recognition offers a valuable means to understand the intricacies of human behavior because it can handle the complex relationships between physical constraints and intention. Although several studies have focused on encoding a skeleton, less attention has been paid to embed this information into the latent representations of human action. InfoGCN proposes a learning framework for action recognition combining a novel learning objective and an encoding method. First, we design an information bottleneck-based learning objective to guide the model to learn informative but compact latent representations. To provide discriminative information for classifying action, we introduce attention-based graph convolution that captures the context-dependent intrinsic topology of human action. In addition, we present a multi-modal representation of the skeleton using the relative position of joints, designed to provide complementary spatial information for joints. InfoGCN(1) surpasses the known state-of-the-art on multiple skeleton-based action recognition benchmarks with the accuracy of 93.0% on NTU RGB+D 60 cross-subject split, 89.8% on NTU RGB+D 120 cross-subject split, and 97.0% on NW-UCLA.

引用

页码：20154 / 20164

页数：11

共 61 条

[1]

Agarap A.F., 2018, CoRR abs/1803.08375

[2]

Alemi Alexander A, 2016, ICLR

[3]

Ba J. L., 2016, Advances in Neural Information Processing Systems (NeurIPS), P1

[4]

Belghazi MI, 2018, PR MACH LEARN RES, V80

[5] Representation Learning: A Review and New Perspectives [J].

Bengio, Yoshua ;

Courville, Aaron ;

Vincent, Pascal .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828

[6]

Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)

[7] A NEW MULTICRITERIA DECISION MAKING APPROACH FOR UNIVERSITY RANKING: THE SKYLINE SIR METHOD [J].

Chai, Junyi ;

Cheng, Ken ;

Liu, Wenbin .

PROCEEDINGS OF 2021 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2021, :27-32

[8] Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition [J].

Chen, Tailin ;

Zhou, Desen ;

Wang, Jian ;

Wang, Shidong ;

Guan, Yu ;

He, Xuming ;

Ding, Errui .

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :4334-4342

[9] Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].

Cheng, Ke ;

Zhang, Yifan ;

He, Xiangyu ;

Chen, Weihan ;

Cheng, Jian ;

Lu, Hanqing .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189

[10] P-CNN: Pose-based CNN Features for Action Recognition [J].

Cheron, Guilhem ;

Laptev, Ivan ;

Schmid, Cordelia .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :3218-3226

← 1 2 3 4 5 6 7 →