TriViews: A general framework to use 3D depth data effectively for action recognition

被引：25

作者：

Chen, Wenbin ^{[1
]}

Guo, Guodong ^{[1
]}

机构：

[1] W Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2015年 / 26卷

关键词：

Action recognition; 3D depth data; RGB-D sensor; Kinect; TriViews framework; Fusion; PFA; Public databases;

D O I：

10.1016/j.jvcir.2014.11.008

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present an effective framework to utilize 3D depth data for action recognition, called TriViews. It projects the 3D depth maps into three views, i.e., front, side, and top views. Under this framework, features can be extracted from each view, separately. Then the three views are combined to derive a complete description of the 3D data. To study the effectiveness and usefulness of the TriViews framework, we extract five different features, i.e., spatiotemporal interest points (STIP), dense trajectory shape (DT-Shape), dense trajectory motion boundary histograms (DT-MBH), skeleton trajectory shape (ST-Shape), and skeleton trajectory motion boundary histograms (ST-MBH). The first three features are representative for actions in intensity data but adapted to depth sequences. The last two are proposed by us, termed as skeleton-based features unique for 3D depth data. The RGB-D sensors, e.g., the Kinect, provide 3D positions of 20 skeleton joints and the evolution of each skeleton joint over time corresponds to one skeleton trajectory. Features aligned with the skeleton trajectory include shape descriptor (ST-Shape) and motion boundary histograms (ST-MBH), extracted to characterize the actions with sparse trajectories. The five features characterize action patterns from different aspects, among which the top three best features are selected and fused based on a probabilistic fusion approach (PFA). We evaluate the proposed framework on three challenging depth action datasets. The experimental results show that the proposed TriViews framework achieves the most accurate results for depth-based action recognition, better than the state-of-the-art methods on all three databases. (C) 2014 Elsevier Inc. All rights reserved.

引用

页码：182 / 191

页数：10

共 51 条

[1] Human Activity Analysis: A Review [J].

Aggarwal, J. K. ;

Ryoo, M. S. .

ACM COMPUTING SURVEYS, 2011, 43 (03)

[2] Evolutionary joint selection to improve human action recognition with RGB-D devices [J].

Andre Chaaraoui, Alexandros ;

Ramon Padilla-Lopez, Jose ;

Climent-Perez, Pau ;

Florez-Revuelta, Francisco .

EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (03) :786-794

[3]

[Anonymous], 2012, INT JOINT C NEURAL N, DOI DOI 10.1109/IJCNN.2012.6252504

[4]

[Anonymous], THESIS DELFT U TECHN

[5]

[Anonymous], BRIT MACH VIS C

[6]

[Anonymous], RECOGNIZING ACTIONS

[7]

[Anonymous], JOINT ANGLES SIMILAR

[8]

[Anonymous], IMAGE VIS COMPUT

[9]

[Anonymous], 2009, BMVC 2009 BRIT MACH

[10] Grassmannian Sparse Representations and Motion Depth Surfaces for 3D Action Recognition [J].

Azary, Sherif ;

Savakis, Andreas .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2013, :492-499

← 1 2 3 4 5 6 →