Improving performance of deep learning models for 3D point cloud semantic segmentation via attention mechanisms

被引:20
作者
Vanian V. [1 ]
Zamanakos G. [1 ]
Pratikakis I. [1 ]
机构
[1] Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi
来源
Computers and Graphics (Pergamon) | 2022年 / 106卷
关键词
3D semantic segmentation; Attention mechanisms; Autonomous driving; Deep learning;
D O I
10.1016/j.cag.2022.06.010
中图分类号
学科分类号
摘要
3D Semantic segmentation is a key element for a variety of applications in robotics and autonomous vehicles. For such applications, 3D data are usually acquired by LiDAR sensors resulting in a point cloud, which is a set of points characterized by its unstructured form and inherent sparsity. For the task of 3D semantic segmentation where the corresponding point clouds should be labeled with semantics, the current tendency is the use of deep learning neural network architectures for effective representation learning. On the other hand, various 2D and 3D computer vision tasks have used attention mechanisms which result in an effective re-weighting of the already learned features. In this work, we aim to investigate the role of attention mechanisms for the task of 3D semantic segmentation for autonomous driving, by identifying the significance of different attention mechanisms when adopted in existing deep learning networks. Our study is further supported by an extensive experimentation on two standard datasets for autonomous driving, namely Street3D and SemanticKITTI, that permit to draw conclusions at both a quantitative and qualitative level. Our experimental findings show that there is a clear advantage when attention mechanisms have been adopted, resulting in a superior performance. In particular, we show that the adoption of a Point Transformer in a SPVCNN network, results in an architecture which outperforms the state of the art on the Street3D dataset. © 2022 Elsevier Ltd
引用
收藏
页码:277 / 287
页数:10
相关论文
共 44 条
[1]  
Qi C.R., Su H., Mo K., Guibas L.J., pp. 652-660
[2]  
Qi C.R., Yi L., Su H., Guibas L.J., Pointnet++: Deep hierarchical feature learning on point sets in a metric space, Advances in neural information processing systems, vol. 30, pp. 5099-5108, (2017)
[3]  
Milioto A., Vizzo I., Behley J., Stachniss C., Rangenet++: Fast and accurate lidar semantic segmentation, 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 4213-4220, (2019)
[4]  
Maturana D., Scherer S., Voxnet: A 3D convolutional neural network for real-time object recognition, 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 922-928, (2015)
[5]  
Choy C., Gwak J., Savarese S., 4D spatio-temporal convnets: Minkowski convolutional neural networks, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3075-3084, (2019)
[6]  
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I., Attention is all you need, Advances in neural information processing systems, vol. 30, pp. 5998-6008, (2017)
[7]  
Hu J., Shen L., Sun G., pp. 7132-7141
[8]  
Woo S., Park J., Lee J.-Y., Kweon I.S., pp. 3-19
[9]  
Liu Z., Zhao X., Huang T., Hu R., Zhou Y., Bai X., pp. 11677-11684
[10]  
Bhattacharyya P., Huang C., Czarnecki K., Self-attention based context-aware 3D object detection, (2021)