Zoom Transformer for Skeleton-Based Group Activity Recognition

被引：46

作者：

Zhang, Jiaxu ^{[1
]}

Jia, Yifan ^{[2
]}

Xie, Wei ^{[3
]}

Tu, Zhigang ^{[1
]}

机构：

[1] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & Re, Wuhan 430079, Peoples R China

[2] Wuhan Univ, Renmin Hosp, Dept Pain, Wuhan 430060, Peoples R China

[3] Cent China Normal Univ, Sch Comp, Wuhan 430079, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Skeleton; Transformers; Feature extraction; Activity recognition; Data mining; Visualization; Task analysis; skeleton-based action; visual transformer; attention mechanism;

D O I：

10.1109/TCSVT.2022.3193574

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Skeleton-based human action recognition has attracted increasing attention and many methods have been proposed to boost the performance. However, these methods still confront three main limitations: 1) Focusing on single-person action recognition while neglecting the group activity of multiple people (more than 5 people). In practice, multi-person group activity recognition via skeleton data is also a meaningful problem. 2) Unable to mine high-level semantic information from the skeleton data, such as interactions among multiple people and their positional relationships. 3) Existing datasets used for multi-person group activity recognition are all RGB videos involved, which cannot be directly applied to skeleton-based group activity analysis. To address these issues, we propose a novel Zoom Transformer to exploit both the low-level single-person motion information and the high-level multi-person interaction information in a uniform model structure with carefully designed Relation-aware Maps. Besides, we estimate the multi-person skeletons from the existing real-world video datasets i.e. Kinetics and Volleyball-Activity, and release two new benchmarks to verify the effectiveness of our Zoom Transfromer. Extensive experiments demonstrate that our model can effectively cope with the skeleton-based multi-person group activity. Additionally, experiments on the large-scale NTU-RGB+D dataset validate that our model also achieves remarkable performance for single-person action recognition. The code and the skeleton data are publicly available at https://github.com/Kebii/Zoom-Transformer

引用

页码：8646 / 8659

页数：14

共 67 条

[1]

Amer MR, 2014, LECT NOTES COMPUT SC, V8694, P572, DOI 10.1007/978-3-319-10599-4_37

[2] Convolutional Relational Machine for Group Activity Recognition [J].

Azar, Sina Mokhtarzadeh ;

Atigh, Mina Ghadimi ;

Nickabadi, Ahmad ;

Alahi, Alexandre .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7884-7893

[3] Fuzzy Integral-Based CNN Classifier Fusion for 3D Skeleton Action Recognition [J].

Banerjee, Avinandan ;

Singh, Pawan Kumar ;

Sarkar, Ram .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) :2206-2216

[4] SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition [J].

Caetano, Carlos ;

Sena, Jessica ;

Bremond, Francois ;

dos Santos, Jefersson A. ;

Schwartz, William Robson .

2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,

[5] Skeleton-Based Action Recognition With Gated Convolutional Neural Networks [J].

Cao, Congqi ;

Lan, Cuiling ;

Zhang, Yifan ;

Zeng, Wenjun ;

Lu, Hanqing ;

Zhang, Yanning .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (11) :3247-3257

[6] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[7] Clustering Driven Deep Autoencoder for Video Anomaly Detection [J].

Chang, Yunpeng ;

Tu, Zhigang ;

Xie, Wei ;

Yuan, Junsong .

COMPUTER VISION - ECCV 2020, PT XV, 2020, 12360 :329-345

[8] Pre-Trained Image Processing Transformer [J].

Chen, Hanting ;

Wang, Yunhe ;

Guo, Tianyu ;

Xu, Chang ;

Deng, Yiping ;

Liu, Zhenhua ;

Ma, Siwei ;

Xu, Chunjing ;

Xu, Chao ;

Gao, Wen .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12294-12305

[9] 3D Human Motion Reconstruction in Unity with Monocular Camera [J].

Chen, Tai-Wei ;

Lin, Wei-Liang .

2020 17TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC 2020), 2020, :191-192

[10] Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition [J].

Cheng, Ke ;

Zhang, Yifan ;

Cao, Congqi ;

Shi, Lei ;

Cheng, Jian ;

Lu, Hanqing .

COMPUTER VISION - ECCV 2020, PT XXIV, 2020, 12369 :536-553

← 1 2 3 4 5 6 7 →