CF-SIS: Semantic-Instance Segmentation of 3D Point Clouds by Context Fusion with Self-Attention

被引:23
作者
Wen, Xin [1 ]
Han, Zhizhong [2 ]
Youk, Geunhyuk [1 ]
Liu, Yu-Shen [3 ]
机构
[1] Tsinghua Univ, Sch Software, Beijing, Peoples R China
[2] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[3] Tsinghua Univ, Sch Software, BNRist, Beijing, Peoples R China
来源
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年
基金
国家重点研发计划;
关键词
3D shape recognition; 3D shape segmentation; point cloud;
D O I
10.1145/3394171.3413829
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D Semantic-Instance Segmentation (SIS) is a newly emerging research direction that aims to understand visual information of 3D scene on both semantic and instance level. The main difficulty lies in how to coordinate the paradox between mutual aid and sub-optimal problem. Previous methods usually address the mutual aid between instances and semantics by direct feature fusion or hand-crafted constraints to share the common knowledge of the two tasks. However, they neglect the abundant common knowledge of feature context in the feature space. Moreover, the direct feature fusion can raise the sub-optimal problem, since the false prediction of instance object can interfere the prediction of the semantic segmentation and vice versa. To address the above two issues, we propose a novel network of feature context fusion for SIS task, named CF-SIS. The idea is to associatively learn semantic and instance segmentation of 3D point clouds by context fusion with attention in the feature space. Our main contributions are two context fusion modules. First, we propose a novel inter-task context fusion module to take full advantage of mutual aid and relive the sub-optimal problem. It extracts the context in feature space from one task with attention, and selectively fuses the context into the other task using a gate fusion mechanism. Then, in order to enhance the mutual aid effect, the intra-task context fusion module is designed to further integrate the fused context, by selectively merging the similar feature through the self-attention mechanism. We conduct experiments on the S3DIS and ShapeNet datasets and show that CF-SIS outperforms the state-of-the-art methods on semantic and instance segmentation task.
引用
收藏
页码:1661 / 1669
页数:9
相关论文
共 42 条
[1]  
[Anonymous], 2017, arXiv preprint arXiv:1702.01105
[2]  
[Anonymous], 2019, P 27 ACM INT C MULT, DOI DOI 10.1145/3343031.3350960
[3]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[4]  
Han ZZ, 2020, PR MACH LEARN RES, V119
[5]  
Han ZZ, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P758
[6]   Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds From Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction [J].
Han, Zhizhong ;
Wang, Xiyang ;
Liu, Yu-Shen ;
Zwicker, Matthias .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :10441-10450
[7]  
Han ZZ, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P766
[8]  
Han ZZ, 2019, AAAI CONF ARTIF INTE, P8376
[9]   3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN With Hierarchical Attention Aggregation [J].
Han, Zhizhong ;
Lu, Honglei ;
Liu, Zhenbao ;
Vong, Chi-Man ;
Liu, Yu-Shen ;
Zwicker, Matthias ;
Han, Junwei ;
Chen, C. L. Philip .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (08) :3986-3999
[10]   SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention [J].
Han, Zhizhong ;
Shang, Mingyang ;
Liu, Zhenbao ;
Vong, Chi-Man ;
Liu, Yu-Shen ;
Zwicker, Matthias ;
Han, Junwei ;
Chen, C. L. Philip .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) :658-672