NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

被引：945

作者：

Liu, Jun ^{[1
]}

Shahroudy, Amir ^{[2
]}

Perez, Mauricio ^{[1
]}

Wang, Gang ^{[5
]}

Duan, Ling-Yu ^{[3
,4
]}

Kot, Alex C. ^{[1
]}

机构：

[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Rapid Rich Object Search Lab, Singapore 639798, Singapore

[2] Chalmers Univ Technol, Dept Elect Engn, S-41296 Gothenburg, Sweden

[3] Peking Univ, Natl Engn Lab Video Technol, Beijing 100871, Peoples R China

[4] Peng Cheng Lab, Shenzhen 518000, Peoples R China

[5] Alibaba Grp, Hangzhou 310052, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2020年 / 42卷 / 10期

基金：

新加坡国家研究基金会; 中国国家自然科学基金;

关键词：

Three-dimensional displays; Benchmark testing; Cameras; Deep learning; Semantics; Lighting; Skeleton; Activity understanding; video analysis; 3D action recognition; RGB plus D vision; deep learning; large-scale benchmark; ACTION RECOGNITION; ACTIONLET ENSEMBLE; FEATURES;

D O I：

10.1109/TPAMI.2019.2916873

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding.

引用

页码：2684 / 2701

页数：18

共 111 条

[1] Human activity recognition from 3D data: A review [J].

Aggarwal, J. K. ;

Xia, Lu .

PATTERN RECOGNITION LETTERS, 2014, 48 :70-80

[2]

[Anonymous], 2014, Comput. Sci.

[3]

[Anonymous], 2015, RSC SMART MATER

[4]

[Anonymous], 2017, ABS170404861 CORR

[5]

[Anonymous], 2011, PROC AAAI C ARTIF IN

[6]

[Anonymous], 2017, P WORKSH VIS AN SMAR, DOI DOI 10.1145/3132734.3132739

[7]

[Anonymous], 2012, CoRR

[8]

Baradel F., 2017, Pose-conditioned Spatio-Temporal Attention for Human Action Recognition

[9] Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points [J].

Baradel, Fabien ;

Wolf, Christian ;

Mille, Julien ;

Taylor, Graham W. .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :469-478

[10] Coding Kendall's Shape Trajectories for 3D Action Recognition [J].

Ben Tanfous, Amor ;

Drira, Hassen ;

Ben Amor, Boulbaba .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2840-2849

← 1 2 3 4 5 6 7 8 9 10 →