Semantic Scene Completion With 2D and 3D Feature Fusion

被引：0

作者：

Park, Sang-Min ^{[1
]}

Ha, Jong-Eun ^{[2
]}

机构：

[1] Seoul Natl Univ Sci & Technol, Grad Sch Automot Engn, Seoul 01811, South Korea

[2] Seoul Natl Univ Sci & Technol, Dept Mech & Automot Engn, Seoul 01811, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

新加坡国家研究基金会;

关键词：

Three-dimensional displays; Feature extraction; Semantics; Solid modeling; Transformers; Cameras; Estimation; Decoding; Proposals; Predictive models; Semantic scene completion; transformer; 3D scene understanding; occupancy;

D O I：

10.1109/ACCESS.2024.3470754

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

3D semantic scene completion (SSC) aims to get a dense semantic understanding of an environment in 3D. It requires a geometric and semantic knowledge of the surrounding environment and the filling of void areas. In this paper, we propose an improved algorithm by modifying VoxFormer. VoxFormer consists of two steps for 3D semantic scene completion. First, it predicts the occupancy of an environment. Then, it completes the semantic scene completion through a masked autoencoder. It requires separate training for two stages, which can cause a disconnect of information from input to output. We propose an improved VoxFormer algorithm that makes end-to-end training possible by integrating occupancy prediction and scene completion. We use pseudo-LiDAR computed by depth estimation as input of 3D CNN, which generates queries for cross attention with 2D features. This makes the process end-to-end by connecting occupancy prediction and semantic scene completion. Experimental results using SemanticKITTI show improvement in the proposed algorithm.

引用

页码：141594 / 141603

页数：10

共 50 条

[41] LiDAR-Camera Continuous Fusion in Voxelized Grid for Semantic Scene Completion
Lu, Zonghao
Cao, Bing
Hu, Qinghua
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12330 - 12344
[42] SEMANTIC SCENE COMPLETION WITH POINT CLOUD REPRESENTATION AND TRANSFORMER-BASED FEATURE FUSION
Fu, Ruochong
Wu, Hang
Hao, Mengxiang
Miao, Yubin
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3369 - 3373
[43] LinkOcc: 3D Semantic Occupancy Prediction With Temporal Association
Ouyang, Wenzhe
Xu, Zenglin
Shen, Bin
Wang, Jinghua
Xu, Yong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1374 - 1384
[44] 3D Layout Estimation via Weakly Supervised Learning of Plane Parameters From 2D Segmentation
Zhang, Weidong
Zhang, Youmei
Song, Ran
Liu, Ying
Zhang, Wei
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 868 - 879
[45] Feature-Centered First Order Structure Tensor Scale-Space in 2D and 3D
Pieta, Pawel Tomasz
Dahl, Anders Bjorholm
Frisvad, Jeppe Revall
Bigdeli, Siavash Arjomand
Christensen, Anders Nymark
IEEE ACCESS, 2025, 13 : 9766 - 9779
[46] Semantic Segmentation of 3D LiDAR Data in Dynamic Scene Using Semi-Supervised Learning
Mei, Jilin
Gao, Biao
Xu, Donghao
Yao, Wen
Zhao, Xijun
Zhao, Huijing
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (06) : 2496 - 2509
[47] Self-Enhanced Feature Fusion for RGB-D Semantic Segmentation
Xiang, Pengcheng
Yao, Baochen
Jiang, Zefeng
Peng, Chengbin
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 3015 - 3019
[48] Drivable Region Completion via a 3D LiDAR
Jang, Wonje
Kim, Euntai
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (08) : 9967 - 9978
[49] An Automatic Multimodal Data Registration Strategy for 2D/3D Information Fusion
Schierl, Jonathan
Asari, Vijayan
Singer, Nina
Aspiras, Theus
Stokes, Andrew
Keaffaber, Brett
Van Rynbach, Andre
Decker, Kevin
Rabb, David
MULTIMODAL IMAGE EXPLOITATION AND LEARNING 2022, 2022, 12100
[50] VGNet: Multimodal Feature Extraction and Fusion Network for 3D CAD Model Retrieval
Qin, Feiwei
Zhan, Gaoyang
Fang, Meie
Chen, C. L. Philip
Li, Ping
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1432 - 1447

← 1 2 3 4 5 →