View Enhanced Jigsaw Puzzle for Self-Supervised Feature Learning in 3D Human Action Recognition

被引：4

作者：

You, Wei ^{[1
]}

Wang, Xue ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Precis Instrument, Beijing 100084, Peoples R China

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Task analysis; Skeleton; Feature extraction; Training; Representation learning; Three-dimensional displays; Image recognition; Action recognition; self-supervised learning; multi-view; pretext task; human skeleton; gate recurrent unit;

D O I：

10.1109/ACCESS.2022.3165040

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Self-supervised learning methods have received much attention in skeleton-based human action recognition. These methods rely on pretext tasks to utilize unlabeled data and learn an effective feature encoder. In this paper, a novel self-supervised learning method is proposed. First, we design a new pretext task called view enhanced jigsaw puzzle (VEJP) to improve the learning difficulty of the encoder. The VEJP introduces multi-view information into the jigsaw puzzle, thus forcing the encoder to learn view-independent high-level features of human skeletons. Based on the encoder trained by VEJP, we propose the view pooling encoder (VPE) to integrate the information of multiple views with the pooling mechanism, and the features extracted by VPE are more robust and distinguishable. In addition, by adjusting the difficulty of VEJP, the influence of the pretext task difficulty on the downstream task performance is studied, and the experimental results show that the pretext tasks should be moderately difficult to achieve effective feature learning. Our method achieves competitive results on representative benchmark datasets. It provides a strong baseline for the jigsaw puzzle task and shows advantages in situations where the number of labeled data is minimal.

引用

页码：36385 / 36396

页数：12

共 50 条

[21] Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition [J].

Shuai Bi ;

Zhengping Hu ;

Mengyao Zhao ;

Hehao Zhang ;

Jirui Di ;

Zhe Sun .

Signal, Image and Video Processing, 2023, 17 :3775-3782

[22] Learning 3D Skeletal Representation From Transformer for Action Recognition [J].

Cha, Junuk ;

Saqlain, Muhammad ;

Kim, Donguk ;

Lee, Seungeun ;

Lee, Seongyeong ;

Baek, Seungryul .

IEEE ACCESS, 2022, 10 :67541-67550

[23] Self-supervised Feature Learning for 3D Medical Images by Playing a Rubik's Cube [J].

Zhuang, Xinrui ;

Li, Yuexiang ;

Hu, Yifan ;

Ma, Kai ;

Yang, Yujiu ;

Zheng, Yefeng .

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 :420-428

[24] Self-supervised Learning with Multi-view Rendering for 3D Point Cloud Analysis [J].

Tran, Bach ;

Hua, Binh-Son ;

Tran, Anh Tuan ;

Hoai, Minh .

COMPUTER VISION - ACCV 2022, PT I, 2023, 13841 :413-431

[25] Point cloud self-supervised learning for machining feature recognition [J].

Zhang, Hang ;

Wang, Wenhu ;

Zhang, Shusheng ;

Wang, Zhen ;

Zhang, Yajun ;

Zhou, Jingtao ;

Huang, Bo .

JOURNAL OF MANUFACTURING SYSTEMS, 2024, 77 :78-95

[26] Feature Learning Capacity Assessment of Deep Convolutional Generative Adversarial Network for Action Recognition in a Self-Supervised Framework [J].

Azrab, Samia ;

Mahmood, Muhammad Habib .

2021 INTERNATIONAL CONFERENCE ON DIGITAL FUTURES AND TRANSFORMATIVE TECHNOLOGIES (ICODT2), 2021,

[27] Contrast-Reconstruction Representation Learning for Self-Supervised Skeleton-Based Action Recognition [J].

Wang, Peng ;

Wen, Jun ;

Si, Chenyang ;

Qian, Yuntao ;

Wang, Liang .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :6224-6238

[28] Dysarthric Speech Recognition Using Pseudo-Labeling, Self-Supervised Feature Learning, and a Joint Multi-Task Learning Approach [J].

Takashima, Ryoichi ;

Sawa, Yuya ;

Aihara, Ryo ;

Takiguchi, Tetsuya ;

Imai, Yoshie .

IEEE ACCESS, 2024, 12 :36990-36999

[29] Self-Supervised Learning via Multi-Transformation Classification for Action Recognition [J].

Duc-Quang Vu ;

Ngan Le ;

Wang, Jia-Ching .

2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024, 2024,

[30] ATOM: Self-supervised human action recognition using atomic motion representation learning [J].

Degardin, Bruno ;

Lopes, Vasco ;

Proenca, Hugo .

IMAGE AND VISION COMPUTING, 2023, 137

← 1 2 3 4 5 →