APSNet: Toward Adaptive Point Sampling for Efficient 3D Action Recognition

被引：14

作者：

Liu, Jiaheng ^{[1
]}

Guo, Jinyang ^{[2
]}

Xu, Dong ^{[3
,4
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China

[2] Beihang Univ, Inst Artificial Intelligence, Beijing 100191, Peoples R China

[3] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW 2006, Australia

[4] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2022年 / 31卷

关键词：

Three-dimensional displays; Point cloud compression; Feature extraction; Videos; Geometry; Data mining; Task analysis; 3D action recognition; point cloud; accuracy-efficiency trade-off; NET;

D O I：

10.1109/TIP.2022.3193290

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Observing that it is still a challenging task to deploy 3D action recognition methods in real-world scenarios, in this work, we investigate the accuracy-efficiency trade-off for 3D action recognition. We first introduce a simple and efficient backbone network structure for 3D action recognition, in which we directly extract the geometry and motion representations from the raw point cloud videos through a set of simple operations (i.e., coordinate offset generation and mini-PoinNet). Based on the backbone network, we propose an end-to-end optimized network called adaptive point sampling network (APSNet) to achieve the accuracy-efficiency trade-off, which mainly consists of three stages: the coarse feature extraction stage, the decision making stage, and the fine feature extraction stage. In APSNet, we adaptively decide the optimal resolutions (i.e., the optimal number of points) for each pair of frames based on any input point cloud video under the given computational complexity constraint. Comprehensive experiments on multiple benchmark datasets demonstrate the effectiveness and efficiency of our newly proposed APSNet for 3D action recognition.

引用

页码：5287 / 5302

页数：16

共 65 条

[41] HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences [J].

Oreifej, Omar ;

Liu, Zicheng .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :716-723

[42]

Qi CR, 2017, ADV NEUR IN, V30

[43] Histogram of Oriented Principal Components for Cross-View Action Recognition [J].

Rahmani, Hossein ;

Mahmood, Arif ;

Du Huynh ;

Mian, Ajmal .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (12) :2430-2443

[44] NTU RGB plus D: A Large Scale Dataset for 3D Human Activity Analysis [J].

Shahroudy, Amir ;

Liu, Jun ;

Ng, Tian-Tsong ;

Wang, Gang .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1010-1019

[45] Skeleton-Based Action Recognition with Directed Graph Neural Networks [J].

Shi, Lei ;

Zhang, Yifan ;

Cheng, Jian ;

Lu, Hanqing .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7904-7913

[46] Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition [J].

Shi, Lei ;

Zhang, Yifan ;

Cheng, Jian ;

Lu, Hanqing .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12018-12027

[47] An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition [J].

Si, Chenyang ;

Chen, Wentao ;

Wang, Wei ;

Wang, Liang ;

Tan, Tieniu .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1227-1236

[48] SPLATNet: Sparse Lattice Networks for Point Cloud Processing [J].

Su, Hang ;

Jampani, Varun ;

Sun, Deqing ;

Maji, Subhransu ;

Kalogerakis, Evangelos ;

Yang, Ming-Hsuan ;

Kautz, Jan .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2530-2539

[49] Learning Fine-grained Image Similarity with Deep Ranking [J].

Wang, Jiang ;

Song, Yang ;

Leung, Thomas ;

Rosenberg, Chuck ;

Wang, Jingbin ;

Philbin, James ;

Chen, Bo ;

Wu, Ying .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1386-1393

[50] Depth Pooling Based Large-Scale 3-D Action Recognition With Convolutional Neural Networks [J].

Wang, Pichao ;

Li, Wanqing ;

Gao, Zhimin ;

Tang, Chang ;

Ogunbona, Philip O. .

IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (05) :1051-1061

← 1 2 3 4 5 6 7 →