Beyond-Skeleton: Zero-shot Skeleton Action Recognition enhanced by supplementary RGB visual information

被引：0

作者：

Liu, Hongjie ^{[1
]}

Niu, Yingchun ^{[1
]}

Zeng, Kun ^{[2
]}

Liu, Chun ^{[1
]}

Hu, Mengjie ^{[1
]}

Song, Qing ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China

[2] Minjiang Univ, Coll Comp & Data Sci, Fujian Prov Key Lab Informat Proc & Intelligent Co, Fuzhou, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2025年 / 273卷

关键词：

Zero-shot learning; Human skeleton data; Prompt learning; Action recognition; REPRESENTATION; LANGUAGE;

D O I：

10.1016/j.eswa.2025.126814

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Zero-shot action recognition (ZSAR) recognizes action categories that have not appeared during the training process and has garnered widespread attention due to its potential to save costs in retraining and data annotation. We observed that the existing ZSAR method based on skeleton sequences only uses human posture information in the skeleton sequence, lacks discriminative semantic representation in some similar behavior recognition, and lacks effective interaction between different modalities, resulting in unsatisfactory performance and limited applications of the ZSAR. To solve these problems, we propose a novel method, called Beyond-Skeleton zero-shot Learning (BSZSL), which is used to enhance zero-shot Skeleton Action Recognition. Firstly, a multi-prompt learning strategy is introduced. It utilizes prompt information to guide the model to simultaneously learn complementary semantic information related to behavior categories from both skeleton sequences and RGB information, making the visual feature information more expressive. Specifically, it employs a pre-trained multimodal model to extract prior knowledge related to behaviors from RGB and then guides the skeleton sequence features using this knowledge. This enhances the complementary features of both RGB and skeleton modalities. Secondly, to constrain the mapping relationship of different modal feature information, a Contrastive Clustering (CC) module is designed. This module emphasizes the similarity of features within the same category while increasing the differences in feature mapping between different categories. Finally, evaluating our method on the NTU-60 and NTU-120 datasets with multi-split settings, the result demonstrates that our method achieves state-of-the-art performance in both zero-shot learning (ZSL) and generalized zero-shot learning (GZSL) settings.

引用

页数：11

共 52 条

[1] Towards zero shot learning of geometry of motion streams and its application to anomaly recognition [J].

Buckchash, Himanshu ;

Raman, Balasubramanian .

EXPERT SYSTEMS WITH APPLICATIONS, 2021, 177

[2] Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints [J].

Caetano, Carlos ;

Bremond, Francois ;

Schwartz, William Robson .

2019 32ND SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 2019, :16-23

[3] Elaborative Rehearsal for Zero-shot Action Recognition [J].

Chen, Shizhe ;

Huang, Dong .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13618-13627

[4] Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].

Cheng, Ke ;

Zhang, Yifan ;

He, Xiangyu ;

Chen, Weihan ;

Cheng, Jian ;

Lu, Hanqing .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189

[5] REConvertor: Transforming Textual Use Cases to High-Level Message Sequence Chart [J].

Ding, Zuohua ;

Shuai, Tiantian ;

Jiang, Mingyue .

2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY COMPANION (QRS-C), 2017, :610-611

[6]

Du Y, 2015, PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, P579, DOI 10.1109/ACPR.2015.7486569

[7] PYSKL: Towards Good Practices for Skeleton Action Recognition [J].

Duan, Haodong ;

Wang, Jiaqi ;

Chen, Kai ;

Lin, Dahua .

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :7351-7354

[8] Revisiting Skeleton-based Action Recognition [J].

Duan, Haodong ;

Zhao, Yue ;

Chen, Kai ;

Lin, Dahua ;

Dai, Bo .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :2959-2968

[9] A comparative review of graph convolutional networks for human skeleton-based action recognition [J].

Feng, Liqi ;

Zhao, Yaqin ;

Zhao, Wenxuan ;

Tang, Jiaxi .

ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (05) :4275-4305

[10] A multimodal approach for human activity recognition based on skeleton and RGB data [J].

Franco, Annalisa ;

Magnani, Antonio ;

Maio, Dario .

PATTERN RECOGNITION LETTERS, 2020, 131 :293-299

← 1 2 3 4 5 6 →