Representation modeling learning with multi-domain decoupling for unsupervised skeleton-based action recognition

被引：0

作者：

He, Zhiquan ^{[1
,2
]}

Lv, Jiantu ^{[2
]}

Fang, Shizhang ^{[2
]}

机构：

[1] Guangdong Key Lab Intelligent Informat Proc, Shenzhen, Peoples R China

[2] Shenzhen Univ, Guangdong Multimedia Informat Serv Engn Technol Re, Shenzhen, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 582卷

基金：

中国国家自然科学基金;

关键词：

Unsupervised learning; Contrastive learning; Action recognition;

D O I：

10.1016/j.neucom.2024.127495

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Skeleton-based action recognition is one of the basic researches in computer vision. In recent years, the unsupervised contrastive learning paradigm has achieved great success in skeleton-based action recognition. However, previous work often treated input skeleton sequences as a whole when performing comparisons, lacking fine-grained representation contrast learning. Therefore, we propose a contrastive learning method for Representation Modeling with Multi-domain D ecoupling (RMMD), which extracts the most significant representations from input skeleton sequences in the temporal domain, spatial domain and frequency domain, respectively. Specifically, in the temporal and spatial domains, we propose a multi-level spatiotemporal mining reconstruction module (STMR) that iteratively reconstructs the original input skeleton sequences to highlight spatiotemporal representations under different actions. At the same time, we introduce position encoding and a global adaptive attention matrix, balancing both global and local information, and effectively modeling the spatiotemporal dependencies between joints. In the frequency domain, we use the discrete cosine transform (DCT) to achieve temporal-frequency conversion, discard part of the interference information, and use the frequency self-attention (FSA) and multi-level aggregation perceptron (MLAP) to deeply explore the frequency domain representation. The fusion of the temporal domain, spatial domain and frequency domain representations makes our model more discriminative in representing different actions. Besides, we verify the effectiveness of the model on the NTU RGB+D and PKU-MMD datasets. Extensive experiments show that our method outperforms existing unsupervised methods and achieves significant performance improvements in downstream tasks such as action recognition and action retrieval.

引用

页数：11

共 50 条

[41] Skeleton-Based Posture Estimation for Human Action Recognition Using Deep Learning [J].

Minh-Trieu Truong ;

Van-Dung Hoang ;

Thi-Minh-Chau Le .

COMPUTATIONAL INTELLIGENCE METHODS FOR GREEN TECHNOLOGY AND SUSTAINABLE DEVELOPMENT, GTSD2024, VOL 1, 2024, 1195 :85-98

[42] Multisource learning for skeleton-based action recognition using deep LSTM and CNN [J].

Cui, Ran ;

Zhu, Aichun ;

Hua, Gang ;

Yin, Hongsheng ;

Liu, Haiqiang .

JOURNAL OF ELECTRONIC IMAGING, 2018, 27 (04)

[43] A Novel Skeleton Spatial Pyramid Model for Skeleton-based Action Recognition [J].

Li, Yanshan ;

Guo, Tianyu ;

Xia, Rongjie ;

Liu, Xing .

2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, :16-20

[44] Insight on Attention Modules for Skeleton-Based Action Recognition [J].

Jiang, Quanyan ;

Wu, Xiaojun ;

Kittler, Josef .

PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 :242-255

[45] Research Progress in Skeleton-Based Human Action Recognition [J].

Liu B. ;

Zhou S. ;

Dong J. ;

Xie M. ;

Zhou S. ;

Zheng T. ;

Zhang S. ;

Ye X. ;

Wang X. .

Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (09) :1299-1322

[46] Temporal Extension Module for Skeleton-Based Action Recognition [J].

Obinata, Yuya ;

Yamamoto, Takuma .

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :534-540

[47] Adversarial Attack on Skeleton-Based Human Action Recognition [J].

Liu, Jian ;

Akhtar, Naveed ;

Mian, Ajmal .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) :1609-1622

[48] Convolutional relation network for skeleton-based action recognition [J].

Zhu, Jiagang ;

Zou, Wei ;

Zhu, Zheng ;

Hu, Yiming .

NEUROCOMPUTING, 2019, 370 :109-117

[49] SKELETON-BASED ACTION RECOGNITION WITH CONVOLUTIONAL NEURAL NETWORKS [J].

Li, Chao ;

Zhong, Qiaoyong ;

Xie, Di ;

Pu, Shiliang .

2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2017,

[50] A Spatiotemporal Fusion Network For Skeleton-Based Action Recognition [J].

Bao, Wenxia ;

Wang, Junyi ;

Yang, Xianjun ;

Chen, Hemu .

2024 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MEDIA COMPUTING, ICIPMC 2024, 2024, :347-352

← 1 2 3 4 5 →