Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling

被引：42

作者：

Fan, Hehe ^{[1
]}

Yang, Yi ^{[2
]}

Kankanhalli, Mohan ^{[1
]}

机构：

[1] Natl Univ Singapore, Sch Comp, Singapore 119077, Singapore

[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310058, Zhejiang, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 02期

关键词：

Point cloud compression; Three-dimensional displays; Transformers; Encoding; Computational modeling; Adaptation models; Solid modeling; Action recognition; point cloud; semantic segmentation; spatio-temporal modeling; video analysis; ACTION RECOGNITION;

D O I：

10.1109/TPAMI.2022.3161735

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to the inherent unorderliness and irregularity of point cloud, points emerge inconsistently across different frames in a point cloud video. To capture the dynamics in point cloud videos, tracking points and limiting temporal modeling range are usually employed to preserve spatio-temporal structure. However, as points may flow in and out across frames, computing accurate point trajectories is extremely difficult, especially for long videos. Moreover, when points move fast, even in a small temporal window, points may still escape from a region. Besides, using the same temporal range for different motions may not accurately capture the temporal structure. In this paper, we propose a Point Spatio-Temporal Transformer (PST-Transformer). To preserve the spatio-temporal structure, PST-Transformer adaptively searches related or similar points across the entire video by performing self-attention on point features. Moreover, our PST-Transformer is equipped with an ability to encode spatio-temporal structure. Because point coordinates are irregular and unordered but point timestamps exhibit regularities and order, the spatio-temporal encoding is decoupled to reduce the impact of the spatial irregularity on the temporal modeling. By properly preserving and encoding spatio-temporal structure, our PST-Transformer effectively models point cloud videos and shows superior performance on 3D action recognition and 4D semantic segmentation.

引用

页码：2181 / 2192

页数：12

共 60 条

[31] Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics [J].

Niemeyer, Michael ;

Mescheder, Lars ;

Oechsle, Michael ;

Geiger, Andreas .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :5378-5388

[32] Joint Angles Similiarities and HOG2 for Action Recognition [J].

Ohn-Bar, Eshed ;

Trivedi, Mohan M. .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2013, :465-470

[33] HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences [J].

Oreifej, Omar ;

Liu, Zicheng .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :716-723

[34]

Prantl L., 2020, PROC INT C LEARN REP

[35]

Qi C.R., 2017, Advances in Neural Information Processing Systems

[36]

Rempe D., 2020, Advances in Neural Information Processing Systems, P13688

[37] The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes [J].

Ros, German ;

Sellart, Laura ;

Materzynska, Joanna ;

Vazquez, David ;

Lopez, Antonio M. .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3234-3243

[38] NTU RGB plus D: A Large Scale Dataset for 3D Human Activity Analysis [J].

Shahroudy, Amir ;

Liu, Jun ;

Ng, Tian-Tsong ;

Wang, Gang .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1010-1019

[39] Skeleton-Based Action Recognition with Directed Graph Neural Networks [J].

Shi, Lei ;

Zhang, Yifan ;

Cheng, Jian ;

Lu, Hanqing .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7904-7913

[40] Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition [J].

Shi, Lei ;

Zhang, Yifan ;

Cheng, Jian ;

Lu, Hanqing .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12018-12027

← 1 2 3 4 5 6 →