3d indoor point cloud semantic segmentation using image and voxel

被引:0
作者
Yeom S.-S. [1 ]
Ha J.-E. [2 ]
机构
[1] Graduate School of Automotive Engineering, Seoul National University of Science and Technology
[2] Department of Mechanical and Automotive Engineering, Seoul National University of Science and Technology
来源
Ha, Jong-Eun (jeha@seoultech.ac.kr) | 1600年 / Institute of Control, Robotics and Systems卷 / 27期
关键词
3D Vision; Point Cloud; Semantic Segmentation;
D O I
10.5302/J.ICROS.2021.21.0142
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a parallel network architecture that exhibits improved performance by fusing two-dimensional (2D) and three-dimensional (3D) features. A voxel-based and a projection-based method were adopted to derive the results through one scan. Our approach consists of two parallel networks, extracts features along each dimension, and converges them in a fusion network. In the fusion network, the voxel blocks and 2D feature maps extracted from each structure are fused to the voxel grid and then trained through convolution. For effective training of 2D networks, we use data augmentation techniques using coordinate system rotation transformation. In addition, a multi-loss with weights applied to each dimension was employed to effectively enhance the performance of the system, and the results revealed that the system exhibited better performance than when a single loss was used. Our proposed method can achieve better performance by changing the performance of the 2D network and 3D network, which can be generalized using other structures. © ICROS 2021.
引用
收藏
页码:1000 / 1007
页数:7
相关论文
共 24 条
[1]  
Krizhevsky A., Sutskever I., Hinton G., Imagenet Classification with Deep Convolutional Neural networks,” Advances in Neural Information Processing Systems, pp. 1097-1105, (2012)
[2]  
Very deep convolutional networks for large-scale image recognition, Arxiv, 1409, (2014)
[3]  
He K., Zhang Z., Ren S., Sun J., Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
[4]  
Long J., Shelhamer E., Darrell T., Fully convolutional networks for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, (2015)
[5]  
Ronnenberger O., Fischer P., Brox T., U-net: Convolutional networks for biomedical image segmentation, International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, Cham, pp. 234-241, (2015)
[6]  
Badrinarayanan V., Kendall A., Cipolla R., Segnet: A deep convolutional encoder-decoder architecture for image segmenta-tion, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 12, pp. 2481-2495, (2017)
[7]  
Chen L.-C., Zhu Y., Papandreou G., Schroff F., Adam H., Encoder-decoder with atrous separable convolution for semantic image segmentation, Proceedings of the European Conference on Computer Vision (ECCV), pp. 801-818, (2018)
[8]  
Qi C.R., Su H., Mo K., Guibas L.J., Pointnet: Deep learning on point sets for 3d classification and segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652-660, (2017)
[9]  
Wang Y., Sun Y., Liu Z., Sarma S.E., Bronstein M.M., Solomon J.M., Dynamic graph cnn for learning on point clouds, ACM Transactions on Graphics, 38, 5, pp. 1-12, (2019)
[10]  
Tang H., Liu Z., Zhao S., Lin Y., Lin J., Wang H., Han S., Searching efficient 3d architectures with sparse point-voxel convolution, European Conference on Computer Vision., pp. 685-702, (2020)