Cross-Model Cross-Stream Learning for Self-Supervised Human Action Recognition

被引：2

作者：

Liu, Mengyuan ^{[1
]}

Liu, Hong ^{[1
]}

Guo, Tianyu ^{[1
]}

机构：

[1] Peking Univ, Shenzhen Grad Sch, State Key Lab Gen Artificial Intelligence, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS | 2024年 / 54卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Multistream; self-supervised learning; skeleton-based action recognition; NETWORKS; FUSION; LSTM;

D O I：

10.1109/THMS.2024.3467334

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Considering the instance-level discriminative ability, contrastive learning methods, including MoCo and SimCLR, have been adapted from the original image representation learning task to solve the self-supervised skeleton-based action recognition task. These methods usually use multiple data streams (i.e., joint, motion, and bone) for ensemble learning, meanwhile, how to construct a discriminative feature space within a single stream and effectively aggregate the information from multiple streams remains an open problem. To this end, this article first applies a new contrastive learning method called bootstrap your own latent (BYOL) to learn from skeleton data, and then formulate SkeletonBYOL as a simple yet effective baseline for self-supervised skeleton-based action recognition. Inspired by SkeletonBYOL, this article further presents a cross-model and cross-stream (CMCS) framework. This framework combines cross-model adversarial learning (CMAL) and cross-stream collaborative learning (CSCL). Specifically, CMAL learns single-stream representation by cross-model adversarial loss to obtain more discriminative features. To aggregate and interact with multistream information, CSCL is designed by generating similarity pseudolabel of ensemble learning as supervision and guiding feature generation for individual streams. Extensive experiments on three datasets verify the complementary properties between CMAL and CSCL and also verify that the proposed method can achieve better results than state-of-the-art methods using various evaluation protocols.

引用

页码：743 / 752

页数：10

共 65 条

[1] Fuzzy Integral-Based CNN Classifier Fusion for 3D Skeleton Action Recognition [J].

Banerjee, Avinandan ;

Singh, Pawan Kumar ;

Sarkar, Ram .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) :2206-2216

[2] Intent Prediction in Human-Human Interactions [J].

Baruah, Murchana ;

Banerjee, Bonny ;

Nagar, Atulya K. K. .

IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2023, 53 (02) :458-463

[3] How and What to Learn: Taxonomizing Self-Supervised Learning for 3D Action Recognition [J].

Ben Tanfous, Amor ;

Zerroug, Aimen ;

Linsley, Drew ;

Serre, Thomas .

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, :2888-2897

[4] Skeleton-Based Action Recognition With Gated Convolutional Neural Networks [J].

Cao, Congqi ;

Lan, Cuiling ;

Zhang, Yifan ;

Zeng, Wenjun ;

Lu, Hanqing ;

Zhang, Yanning .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (11) :3247-3257

[5] OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields [J].

Cao, Zhe ;

Hidalgo, Gines ;

Simon, Tomas ;

Wei, Shih-En ;

Sheikh, Yaser .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) :172-186

[6]

Chen Xinlei, 2020, Improved baselines with momentum contrastive learning

[7] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition [J].

Chen, Yuxin ;

Zhang, Ziqi ;

Yuan, Chunfeng ;

Li, Bing ;

Deng, Ying ;

Hu, Weiming .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13339-13348

[8] Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition [J].

Chen, Tailin ;

Zhou, Desen ;

Wang, Jian ;

Wang, Shidong ;

Guan, Yu ;

He, Xuming ;

Ding, Errui .

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :4334-4342

[9]

Cheng Y-B, 2021, 2021 IEEE INT C MULT, DOI DOI 10.1109/ICME51207.2021.9428459

[10]

Du Y, 2015, PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, P579, DOI 10.1109/ACPR.2015.7486569

← 1 2 3 4 5 6 7 →