NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

被引:945
作者
Liu, Jun [1 ]
Shahroudy, Amir [2 ]
Perez, Mauricio [1 ]
Wang, Gang [5 ]
Duan, Ling-Yu [3 ,4 ]
Kot, Alex C. [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Rapid Rich Object Search Lab, Singapore 639798, Singapore
[2] Chalmers Univ Technol, Dept Elect Engn, S-41296 Gothenburg, Sweden
[3] Peking Univ, Natl Engn Lab Video Technol, Beijing 100871, Peoples R China
[4] Peng Cheng Lab, Shenzhen 518000, Peoples R China
[5] Alibaba Grp, Hangzhou 310052, Peoples R China
基金
新加坡国家研究基金会; 中国国家自然科学基金;
关键词
Three-dimensional displays; Benchmark testing; Cameras; Deep learning; Semantics; Lighting; Skeleton; Activity understanding; video analysis; 3D action recognition; RGB plus D vision; deep learning; large-scale benchmark; ACTION RECOGNITION; ACTIONLET ENSEMBLE; FEATURES;
D O I
10.1109/TPAMI.2019.2916873
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding.
引用
收藏
页码:2684 / 2701
页数:18
相关论文
共 111 条
[1]   Human activity recognition from 3D data: A review [J].
Aggarwal, J. K. ;
Xia, Lu .
PATTERN RECOGNITION LETTERS, 2014, 48 :70-80
[2]  
[Anonymous], 2014, Comput. Sci.
[3]  
[Anonymous], 2015, RSC SMART MATER
[4]  
[Anonymous], 2017, ABS170404861 CORR
[5]  
[Anonymous], 2011, PROC AAAI C ARTIF IN
[6]  
[Anonymous], 2017, P WORKSH VIS AN SMAR, DOI DOI 10.1145/3132734.3132739
[7]  
[Anonymous], 2012, CoRR
[8]  
Baradel F., 2017, Pose-conditioned Spatio-Temporal Attention for Human Action Recognition
[9]   Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points [J].
Baradel, Fabien ;
Wolf, Christian ;
Mille, Julien ;
Taylor, Graham W. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :469-478
[10]   Coding Kendall's Shape Trajectories for 3D Action Recognition [J].
Ben Tanfous, Amor ;
Drira, Hassen ;
Ben Amor, Boulbaba .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2840-2849