Learning point cloud context information based on 3D transformer for more accurate and efficient classification

被引:4
作者
Chen, Yiping [1 ]
Zhang, Shuai [1 ,3 ]
Lin, Weisheng [2 ]
Zhang, Shuhang [1 ,3 ]
Zhang, Wuming [1 ]
机构
[1] Sun Yat Sen Univ, Sch Geospatial Engn & Sci, Zhuhai, Peoples R China
[2] Xiamen Univ, Fujian Key Lab Sensing & Comp Smart Cities, Xiamen, Peoples R China
[3] Sun Yat Sen Univ, Sch Geospatial Engn & Sci, Zhuhai 519082, Peoples R China
基金
中国国家自然科学基金;
关键词
classification; context information; point cloud; 3D transformer;
D O I
10.1111/phor.12469
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
The point cloud semantic understanding task has made remarkable progress along with the development of 3D deep learning. However, aggregating spatial information to improve the local feature learning capability of the network remains a major challenge. Many methods have been used for improving local information learning, such as constructing a multi-area structure for capturing different area information. However, it will lose some local information due to the independent learning point feature. To solve this problem, a new network is proposed that considers the importance of the differences between points in the neighbourhood. Capturing local feature information can be enhanced by highlighting the different feature importance of the point cloud in the neighbourhood. First, T-Net is constructed to learn the point cloud transformation matrix for point cloud disorder. Second, transformer is used to improve the problem of local information loss due to the independence of each point in the neighbourhood. The experimental results show that 92.2% accuracy overall was achieved on the ModelNet40 dataset and 93.8% accuracy overall was achieved on the ModelNet10 dataset. The figure shows the pipeline of point cloud classification which is similar to PointNet. T-Net is used to eliminate the effect of point cloud rotation and a 3D transformer module is utilised to learn the point cloud context information. Finally, the MLP is utilised to map to the category dimension. Experiments show that our method is accurate and efficient.image
引用
收藏
页码:603 / 616
页数:14
相关论文
共 32 条
[1]  
[Anonymous], 2015, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
[2]   A semantic segmentation method for vehicle-borne laser scanning point clouds in motorway scenes [J].
Chen, Min ;
Zhou, Chengyu ;
Lv, Qi ;
Zhu, Qing ;
Xu, Bo ;
Hu, Han ;
Ding, Yulin ;
Ge, Xuming ;
Chen, Jie ;
Guo, Xiaocui .
PHOTOGRAMMETRIC RECORD, 2023, 38 (182) :94-117
[3]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[4]  
Devlin J., 2018, PREPRINT, DOI [DOI 10.48550/ARXIV.1810.04805, 10.48550/arxiv.1810.04805]
[5]   General-Purpose Deep Point Cloud Feature Extractor [J].
Dominguez, Miguel ;
Dhamdhere, Rohan ;
Petkar, Atir ;
Jain, Saloni ;
Sah, Shagan ;
Ptucha, Raymond .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :1972-1981
[6]  
Dosovitskiy A., 2021, P INT C LEARN REPR, DOI DOI 10.48550/ARXIV.2010.11929
[7]   PCT: Point cloud transformer [J].
Guo, Meng-Hao ;
Cai, Jun-Xiong ;
Liu, Zheng-Ning ;
Mu, Tai-Jiang ;
Martin, Ralph R. ;
Hu, Shi-Min .
COMPUTATIONAL VISUAL MEDIA, 2021, 7 (02) :187-199
[8]   Masked Autoencoders Are Scalable Vision Learners [J].
He, Kaiming ;
Chen, Xinlei ;
Xie, Saining ;
Li, Yanghao ;
Dollar, Piotr ;
Girshick, Ross .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :15979-15988
[9]  
Kasaei H., 2019, ORTHOGRAPHICNET DEEP, V26, P2910
[10]  
Kazhdan M.M., 2003, S GEOM PROC, V6