GeometryMotion-Net: A Strong Two-Stream Baseline for 3D Action Recognition

被引：24

作者：

Liu, Jiaheng ^{[1
]}

Xu, Dong ^{[2
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China

[2] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW 2006, Australia

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2021年 / 31卷 / 12期

基金：

澳大利亚研究理事会;

关键词：

Three-dimensional displays; Feature extraction; Geometry; Cloud computing; Skeleton; Data mining; Deep learning; Point cloud; 3D action recognition; two-stream;

D O I：

10.1109/TCSVT.2021.3101847

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this work, we propose a strong two-stream baseline method referred to as GeometryMotion-Net for 3D action recognition. For efficient 3D action recognition, we first represent each point cloud sequence as a limited number of randomly sampled frames with each frame consisting of a sparse set of points. After that, we propose a new two-stream framework for effective 3D action recognition. For the geometry stream, we propose a new module to produce a virtual overall geometry point cloud by first merging all 3D points from these selected frames, and then we exploit local neighborhood information of each point in the feature space. In the motion stream, for any two neighboring point cloud frames, we also propose a new module to generate one virtual forward motion point cloud and one virtual backward motion point cloud. Specifically, for each point in the current frame, we first produce a set of 3D offset features relative to the neighboring points in the reference frame (i.e., the previous/subsequent frame) and then exploit local neighborhood information of this point in the offset feature space. Based on the newly generated virtual overall geometry point cloud and multiple virtual forward/backward motion point clouds, any existing point cloud analysis methods (e.g., PointNet) can be readily adopted for extracting discriminant geometry and bidirectional motion features in the geometry and motion streams, respectively, which are further aggregated to make our two-stream network trainable in an end-to-end fashion. Comprehensive experiments on both large-scale datasets (i.e. NTU RGB+D 60 and NTU RGB+D 120) and small-scale datasets (i.e., N-UCLA and UWA3DII) demonstrate the effectiveness and efficiency of our two-stream network for 3D action recognition.

引用

页码：4711 / 4721

页数：11

共 51 条

[1]

[Anonymous], 2008, A spatio-temporal descriptor based on 3Dgradients

[2]

[Anonymous], 2016, LECT NOTES COMP VIII

[3] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [J].

Behley, Jens ;

Garbade, Martin ;

Milioto, Andres ;

Quenzel, Jan ;

Behnke, Sven ;

Stachniss, Cyrill ;

Gall, Juergen .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9296-9306

[4] Action Recognition with Dynamic Image Networks [J].

Bilen, Hakan ;

Fernando, Basura ;

Gavves, Efstratios ;

Vedaldi, Andrea .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (12) :2799-2813

[5] SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition [J].

Caetano, Carlos ;

Sena, Jessica ;

Bremond, Francois ;

dos Santos, Jefersson A. ;

Schwartz, William Robson .

2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,

[6] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[7] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[8] SlowFast Networks for Video Recognition [J].

Feichtenhofer, Christoph ;

Fan, Haoqi ;

Malik, Jitendra ;

He, Kaiming .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6201-6210

[9] HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-scale Point Clouds [J].

Gu, Xiuye ;

Wang, Yijie ;

Wu, Chongruo ;

Lee, Yong Jae ;

Wang, Panqu .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3249-3258

[10] SO-Net: Self-Organizing Network for Point Cloud Analysis [J].

Li, Jiaxin ;

Chen, Ben M. ;

Lee, Gim Hee .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :9397-9406

← 1 2 3 4 5 6 →