Object-Aware Semantic Scene Completion Through Attention-Based Feature Fusion and Voxel-Points Representation

被引:0
作者
Miao, Yubin [1 ]
Wan, Junkang [1 ]
Luo, Junjie [1 ]
Wu, Hang [1 ]
Fu, Ruochong [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Three-dimensional displays; Feature extraction; Transformers; Point cloud compression; Image reconstruction; Task analysis; Image analysis; Scene classification; Deep learning; Shape measurement; Semantic scene completion; semantic segmentation; 3D scene reconstruction; deep learning; point transformer; point cloud; SEGMENTATION; NETWORK;
D O I
10.1109/ACCESS.2024.3370844
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Semantic scene completion is a computer vision technique that combines semantic segmentation and shape completion. Its purpose is to infer a complete 3D scene with semantic information from single-view RGB-D images. In recent years, some methods have adopted the voxel-points-based approach, converting voxelized scenes into point clouds to reduce the computational cost associated with 3D convolutions. However, majority of such methods do not fully consider the geometric details of the objects in the scene. In this paper, we propose ASPNet (Attention-based Semantic Point Completion Network), a two-branch semantic scene completion algorithm that combines scene-level completion and object refinement. In the scene level completion branch, we design the SPT (Semantic-based Point Transformer) module, which introduces semantic information into the traditional Point Transformer layer to realize the feature aggregation of neighboring keypoints of the same category. Using the object detection module and the object refinement module, ASPNet refines the rough semantic complementation results obtained from direct coding and decoding of RGB-D inputs. The quantitative results show that ASPNet has much less computational overhead than the 3D convolution-based semantic scene completion algorithm, while the reconstruction results have more geometric details.
引用
收藏
页码:31431 / 31442
页数:12
相关论文
共 45 条
[1]   MonoScene: Monocular 3D Semantic Scene Completion [J].
Anh-Quan Cao ;
de Charette, Raoul .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :3981-3991
[2]   Semantic Scene Completion via Integrating Instances and Scene in-the-Loop [J].
Cai, Yingjie ;
Chen, Xuesong ;
Zhang, Chao ;
Lin, Kwan-Yee ;
Wang, Xiaogang ;
Li, Hongsheng .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :324-333
[3]   ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation [J].
Cao, Jinming ;
Leng, Hanchao ;
Lischinski, Dani ;
Cohen-Or, Danny ;
Tu, Changhe ;
Li, Yangyan .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :7068-7077
[4]  
Chen LCE, 2018, PROC EUR C COMPUT VI, V11211, P833, DOI DOI 10.1007/978-3-030-01234-2_49
[5]   3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior [J].
Chen, Xiaokang ;
Lin, Kwan-Yee ;
Qian, Chen ;
Zeng, Gang ;
Li, Hongsheng .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4192-4201
[6]  
Chen XK, 2020, IEEE IMAGE PROC, P2830, DOI 10.1109/ICIP40778.2020.9191318
[7]  
Cheng R, 2020, PR MACH LEARN RES, V155, P2148
[8]   Semantic Scene Completion from a Single 360-Degree Image and Depth Map [J].
Dourado, Aloisio ;
Kim, Hansung ;
de Campos, Teofilo E. ;
Hilton, Adrian .
PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, :36-46
[9]   Structured Prediction of Unobserved Voxels From a Single Depth Image [J].
Firman, Michael ;
Mac Aodha, Oisin ;
Julier, Simon ;
Brostow, Gabriel J. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5431-5440
[10]   SEMANTIC SCENE COMPLETION WITH POINT CLOUD REPRESENTATION AND TRANSFORMER-BASED FEATURE FUSION [J].
Fu, Ruochong ;
Wu, Hang ;
Hao, Mengxiang ;
Miao, Yubin .
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, :3369-3373