PTANet: Triple Attention Network for point cloud semantic segmentation

被引:34
作者
Cheng, Haozhe [1 ]
Lu, Jian [1 ]
Luo, Maoxin [1 ]
Liu, Wei [1 ]
Zhang, Kaibing [1 ]
机构
[1] Xian Polytech Univ, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
3D point cloud; Semantic segmentation; Contextual representation; Self-attention mechanism; CONVOLUTIONS;
D O I
10.1016/j.engappai.2021.104239
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For 3D point cloud semantic segmentation, mining more informative features to enrich contextual representation is regarded as the key to achieve better segmentation performance. Unfortunately, the existing point cloud segmentation network lacks a comprehensive consideration of utilizing contextual information from both global and local perspectives, thus failing to fully explore the contextual representation, which prevents fine-grained objects from being accurately recognized. Therefore, this paper proposes a neural network dubbed PTANet that effectively enriches the contextual representation to improve segmentation accuracy. PTANet possesses two uncomplicated and effective parts: Triple Attention Block and Density Scale Learning Strategy. Triple Attention Block consists of three sub modules: 1. Position attention module updates feature maps by modeling the interdependency between the spatial positions of each point. 2. Channel attention module recalibrates the original feature in the light of the correlation weight between the channels of feature maps to enrich the contextual representation globally. 3. Local Region attention module calculates the interdependence weight between local neighbors to further complement the local feature information. In addition, to alleviate the adverse effect of non-uniform distribution of point cloud on the inference results, Density Scale Learning Strategy applies the kernel density estimation under the adaptive bandwidth to fit the density scale of each point. In particular, the density scale weighted to the feature maps can also supplement the density information for local features. The experimental performance verifies the effectiveness of PTANet. It obtained 86.1% mIoU on ShapeNet, 62.4% mIoU on ScannetV2, and 87.9% OA on S3DIS.
引用
收藏
页数:12
相关论文
共 47 条
[1]  
[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.170
[2]   ConvPoint: Continuous convolutions for point cloud processing [J].
Boulch, Alexandre .
COMPUTERS & GRAPHICS-UK, 2020, 88 :24-34
[3]   SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks [J].
Boulch, Alexandre ;
Guerry, Yids ;
Le Saux, Bertrand ;
Audebert, Nicolas .
COMPUTERS & GRAPHICS-UK, 2018, 71 :189-198
[4]  
Chen C., 2019, ABS190508705ARXIV190
[5]   ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].
Dai, Angela ;
Chang, Angel X. ;
Savva, Manolis ;
Halber, Maciej ;
Funkhouser, Thomas ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443
[6]  
Engelmann F, 2020, IEEE INT CONF ROBOT, P9463, DOI [10.1109/ICRA40945.2020.9197503, 10.1109/icra40945.2020.9197503]
[7]   Point attention network for semantic segmentation of 3D point clouds [J].
Feng, Mingtao ;
Zhang, Liang ;
Lin, Xuefei ;
Gilani, Syed Zulqarnain ;
Mian, Ajmal .
PATTERN RECOGNITION, 2020, 107
[8]   Dual Attention Network for Scene Segmentation [J].
Fu, Jun ;
Liu, Jing ;
Tian, Haijie ;
Li, Yong ;
Bao, Yongjun ;
Fang, Zhiwei ;
Lu, Hanqing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149
[9]   SnapNet-R: Consistent 3D Multi-View Semantic Labeling for Robotics [J].
Guerry, Joris ;
Boulch, Alexandre ;
Le Saux, Bertrand ;
Moras, Julien ;
Plyer, Aurelien ;
Filliat, David .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, :669-678
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778