MFFNet: multimodal feature fusion network for point cloud semantic segmentation

被引:6
作者
Ren, Dayong [1 ]
Li, Jiawei [1 ]
Wu, Zhengyi [1 ]
Guo, Jie [1 ]
Wei, Mingqiang [2 ]
Guo, Yanwen [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Nanjing 211106, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature fusion; Point cloud semantic segmentation;
D O I
10.1007/s00371-023-02907-w
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We introduce a multimodal feature fusion network (MFFNet) for 3D point cloud semantic segmentation. Unlike previous methods that directly learn from colored point clouds (XYZRGB), MFFNet transforms point clouds to 2D RGB image and frequency image representations for efficient multimodal feature fusion. For each point, MFFNet performs a local projection by automatically learning a weighted orthogonal projection to softly project surrounding points onto 2D images. Regular 2D convolution can thus be applied to these regular grids for efficient semantic feature learning. Then, we fuse 2D semantic features into 3D point cloud features by using a multimodal feature fusion module (MFF). MFF module could employ high-level features from 2D RGB images and frequency images to boost the intrinsic correlation and discriminability of different structure features from the point cloud. In particular, the discriminative descriptions are quantified and leveraged as the local soft attention mask further to enforce the structure feature of the semantic categories. We have evaluated the proposed method on the S3DIS and ScanNet datasets. Experimental results and comparisons with four backbone methods demonstrate that our framework can perform better.
引用
收藏
页码:5155 / 5167
页数:13
相关论文
共 40 条
[1]  
Armeni I., 2017, arXiv, DOI DOI 10.48550/ARXIV.1702.01105
[2]   3D Semantic Parsing of Large-Scale Indoor Spaces [J].
Armeni, Iro ;
Sener, Ozan ;
Zamir, Amir R. ;
Jiang, Helen ;
Brilakis, Ioannis ;
Fischer, Martin ;
Savarese, Silvio .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1534-1543
[3]   Multimodal Machine Learning: A Survey and Taxonomy [J].
Baltrusaitis, Tadas ;
Ahuja, Chaitanya ;
Morency, Louis-Philippe .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443
[4]   SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [J].
Behley, Jens ;
Garbade, Martin ;
Milioto, Andres ;
Quenzel, Jan ;
Behnke, Sven ;
Stachniss, Cyrill ;
Gall, Juergen .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9296-9306
[5]   Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering [J].
Cao, Jianjian ;
Qin, Xiameng ;
Zhao, Sanyuan ;
Shen, Jianbing .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (03) :4160-4171
[6]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[7]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[8]   ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].
Dai, Angela ;
Chang, Angel X. ;
Savva, Manolis ;
Halber, Maciej ;
Funkhouser, Thomas ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443
[9]  
Howard AG, 2017, Arxiv, DOI arXiv:1704.04861
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778