DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation

被引：1

作者：

Ju, Xiaoliang ^{[1
]}

Huang, Zhaoyang ^{[1
]}

Li, Yijin ^{[2
]}

Zhang, Guofeng ^{[2
]}

Qiao, Yu ^{[3
]}

Li, Hongsheng ^{[1
,4
]}

机构：

[1] CUHK MMLab, Hong Kong, Peoples R China

[2] Zhejiang Univ, Hangzhou, Peoples R China

[3] Shanghai AI Lab, Shanghai, Peoples R China

[4] CPII InnoHK, Hong Kong, Peoples R China

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024 | 2024年

基金：

国家重点研发计划;

关键词：

D O I：

10.1109/CVPR52733.2024.00433

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present DiffInDScene, a novel framework for tackling the problem of high-quality 3D indoor scene generation, which is challenging due to the complexity and diversity of the indoor scene geometry. Although diffusionbased generative models have previously demonstrated impressive performance in image generation and object-level 3D generation, they have not yet been applied to roomlevel 3D generation due to their computationally intensive costs. In DiffInDScene, we propose a cascaded 3D diffusion pipeline that is efficient and possesses strong generative performance for Truncated Signed Distance Function (TSDF). The whole pipeline is designed to run on a sparse occupancy space in a coarse-to-fine fashion. Inspired by KinectFusion's incremental alignment and fusion of local TSDF volumes, we propose a diffusion-based SDF fusion approach that iteratively diffuses and fuses local TSDF volumes, facilitating the generation of an entire room environment. The generated results demonstrate that our work is capable to achieve high-quality room generation directly in three-dimensional space, starting from scratch. In addition to the scene generation, the final part of DiffInDScene can be used as a post-processing module to refine the 3D reconstruction results from multi-view stereo. According to the user study, the mesh quality generated by our DiffInDScene can even outperform the ground truth mesh provided by ScanNet.

引用

页码：4526 / 4535

页数：10

共 47 条

[1]

[Anonymous], 2006, P 4 EUR S GEOM PROC

[2]

Bozic A., 2021, P ADV NEUR INF PROC, P1403

[3] On the equivalence of regularity criteria for triangular and tetrahedral finite element partitions [J].

Brandts, Jan ;

Korotov, Sergey ;

Krizek, Michal .

COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (10) :2227-2233

[4]

Chang A., 2017, arXiv

[5]

Chang Angel, 2014, P 2014 C EMP METH NA, P2028, DOI [10.3115/v1/d14-1217, 10.3115/v1/D14-1217.]

[6]

Cignoni P., 2008, EUR IT CHAPT C, P129, DOI [DOI 10.2312/LOCALCHAPTEREVENTS/ITALCHAP/ITALIANCHAPCONF2008/129-136, 10.2312/LocalChapterEvents/ItalChap/ItalianChapConf2008]

[7] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].

Dai, Angela ;

Qi, Charles Ruizhongtai ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554

[8] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].

Dai, Angela ;

Chang, Angel X. ;

Savva, Manolis ;

Halber, Maciej ;

Funkhouser, Thomas ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443

[9] A Novel Neuron-Specific Regulator of the V-ATPase in Drosophila [J].

Dulac, Amina ;

Issa, Abdul-Raouf ;

Sun, Jun ;

Matassi, Giorgio ;

Jonas, Celia ;

Cherif-Zahar, Baya ;

Cattaert, Daniel ;

Birman, Serge .

ENEURO, 2021, 8 (05) :1-22

[10] Taming Transformers for High-Resolution Image Synthesis [J].

Esser, Patrick ;

Rombach, Robin ;

Ommer, Bjoern .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12868-12878

← 1 2 3 4 5 →