Semantic Scene Completion With 2D and 3D Feature Fusion

被引:0
|
作者
Park, Sang-Min [1 ]
Ha, Jong-Eun [2 ]
机构
[1] Seoul Natl Univ Sci & Technol, Grad Sch Automot Engn, Seoul 01811, South Korea
[2] Seoul Natl Univ Sci & Technol, Dept Mech & Automot Engn, Seoul 01811, South Korea
来源
IEEE ACCESS | 2024年 / 12卷
基金
新加坡国家研究基金会;
关键词
Three-dimensional displays; Feature extraction; Semantics; Solid modeling; Transformers; Cameras; Estimation; Decoding; Proposals; Predictive models; Semantic scene completion; transformer; 3D scene understanding; occupancy;
D O I
10.1109/ACCESS.2024.3470754
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
3D semantic scene completion (SSC) aims to get a dense semantic understanding of an environment in 3D. It requires a geometric and semantic knowledge of the surrounding environment and the filling of void areas. In this paper, we propose an improved algorithm by modifying VoxFormer. VoxFormer consists of two steps for 3D semantic scene completion. First, it predicts the occupancy of an environment. Then, it completes the semantic scene completion through a masked autoencoder. It requires separate training for two stages, which can cause a disconnect of information from input to output. We propose an improved VoxFormer algorithm that makes end-to-end training possible by integrating occupancy prediction and scene completion. We use pseudo-LiDAR computed by depth estimation as input of 3D CNN, which generates queries for cross attention with 2D features. This makes the process end-to-end by connecting occupancy prediction and semantic scene completion. Experimental results using SemanticKITTI show improvement in the proposed algorithm.
引用
收藏
页码:141594 / 141603
页数:10
相关论文
共 50 条
  • [41] LiDAR-Camera Continuous Fusion in Voxelized Grid for Semantic Scene Completion
    Lu, Zonghao
    Cao, Bing
    Hu, Qinghua
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12330 - 12344
  • [42] SEMANTIC SCENE COMPLETION WITH POINT CLOUD REPRESENTATION AND TRANSFORMER-BASED FEATURE FUSION
    Fu, Ruochong
    Wu, Hang
    Hao, Mengxiang
    Miao, Yubin
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3369 - 3373
  • [43] LinkOcc: 3D Semantic Occupancy Prediction With Temporal Association
    Ouyang, Wenzhe
    Xu, Zenglin
    Shen, Bin
    Wang, Jinghua
    Xu, Yong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1374 - 1384
  • [44] 3D Layout Estimation via Weakly Supervised Learning of Plane Parameters From 2D Segmentation
    Zhang, Weidong
    Zhang, Youmei
    Song, Ran
    Liu, Ying
    Zhang, Wei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 868 - 879
  • [45] Feature-Centered First Order Structure Tensor Scale-Space in 2D and 3D
    Pieta, Pawel Tomasz
    Dahl, Anders Bjorholm
    Frisvad, Jeppe Revall
    Bigdeli, Siavash Arjomand
    Christensen, Anders Nymark
    IEEE ACCESS, 2025, 13 : 9766 - 9779
  • [46] Semantic Segmentation of 3D LiDAR Data in Dynamic Scene Using Semi-Supervised Learning
    Mei, Jilin
    Gao, Biao
    Xu, Donghao
    Yao, Wen
    Zhao, Xijun
    Zhao, Huijing
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (06) : 2496 - 2509
  • [47] Self-Enhanced Feature Fusion for RGB-D Semantic Segmentation
    Xiang, Pengcheng
    Yao, Baochen
    Jiang, Zefeng
    Peng, Chengbin
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 3015 - 3019
  • [48] Drivable Region Completion via a 3D LiDAR
    Jang, Wonje
    Kim, Euntai
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (08) : 9967 - 9978
  • [49] An Automatic Multimodal Data Registration Strategy for 2D/3D Information Fusion
    Schierl, Jonathan
    Asari, Vijayan
    Singer, Nina
    Aspiras, Theus
    Stokes, Andrew
    Keaffaber, Brett
    Van Rynbach, Andre
    Decker, Kevin
    Rabb, David
    MULTIMODAL IMAGE EXPLOITATION AND LEARNING 2022, 2022, 12100
  • [50] VGNet: Multimodal Feature Extraction and Fusion Network for 3D CAD Model Retrieval
    Qin, Feiwei
    Zhan, Gaoyang
    Fang, Meie
    Chen, C. L. Philip
    Li, Ping
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1432 - 1447