You Will Never Walk Alone: One-Shot 3D Action Recognition With Point Cloud Sequence

被引：0

作者：

Tong, Xingyu ^{[1
]}

Xiao, Yang ^{[1
]}

Tan, Bo ^{[1
]}

Yang, Jianyu ^{[2
]}

Cao, Zhiguo ^{[1
]}

Zhou, Joey Tianyi ^{[3
,4
,5
]}

Yuan, Junsong ^{[6
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Natl Key Lab Multispectral Informat Intelligent P, Wuhan 430074, Peoples R China

[2] Soochow Univ, Sch Rail Transportat, Suzhou 215000, Peoples R China

[3] ASTAR, Ctr Frontier Res CFAR, Singapore 138632, Singapore

[4] ASTAR, Inst High Performance Comp IHPC, Singapore 138632, Singapore

[5] Ctr Adv Technol Online Safety CATOS, Singapore 138632, Singapore

[6] Univ Buffalo State Univ New York, Comp Sci & Engn Dept, Buffalo, NY 14260 USA

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 11期

基金：

中国国家自然科学基金; 新加坡国家研究基金会;

关键词：

One-shot 3D action recognition; 3D dynamic voxel; vision transformer; distribution calibration;

D O I：

10.1109/TCSVT.2024.3421304

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this work, we pay the first effort to address one-shot 3D action recognition in point cloud sequence, without skeleton information. The main contribution lies in two folders. First, a novel one-shot classification approach that considers the feature distribution of 3D action is proposed. We find that, for different 3D actions their dimensional-wise feature distributions are generally in Gaussian form and similar action categories hold approximate feature distributions. Accordingly, K-nearest base classes' mean value and covariance matrix information help to form one-shot novel class's pseudo feature distribution. To alleviate the potential ambiguous problem within nearest neighbor search, we divide the base classes into subsets via C-means clustering to facilitate the similarity measure to novel class. Meanwhile, the feature distribution of base class's whole set and subsets will be jointly considered for generating novel class's pseudo feature distribution. Multi-dimensional Gaussian sampling is conducted on the acquired pseudo feature distribution for feature-level data augmentation, to make one-shot novel class "never walk alone" for leveraging classifier training. Secondly to better characterize fine-grained 3D action, a temporal attention method is proposed, via introducing vision Transformer (ViT) to capture action's discriminative short-term motion pattern with densely sampled short-term 3DV (3D dynamic voxel) features along temporal dimension. Experiments on NTU RGB+D 120 and 60 verify superiority of our approach. It outperforms state-of-the-art skeleton-based methods by 13.9% at most. The source code is available at https://github.com/Tong-XY/YNWA.

引用

页码：11464 / 11477

页数：14