HiFi-123: Towards High-Fidelity One Image to 3D Content Generation

被引：0

作者：

Yui, Wangbo ^{[1
,2
]}

Yuan, Li ^{[1
,2
]}

Cao, Yan-Pei ^{[3
]}

Gao, Xiangjun ^{[5
]}

Li, Xiaoyu ^{[4
]}

Hog, Wenbo ^{[4
]}

Quan, Long ^{[5
]}

Shan, Ying ^{[4
]}

Tian, Yonghong ^{[1
,2
]}

机构：

[1] Peking Univ, Sch Elect & Comp Engn, Shenzhen, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Peoples R China

[3] VAST, Beijing, Peoples R China

[4] Tencent AI Lab, Shenzhen, Peoples R China

[5] Hong Kong Univ Sci & Technol, Sai Kung, Hong Kong, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT LXXIII | 2025年 / 15131卷

关键词：

D O I：

10.1007/978-3-031-73464-9_16

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent advances in diffusion models have enabled 3D generation from a single image. However, current methods often produce suboptimal results for novel views, with blurred textures and deviations from the reference image, limiting their practical applications. In this paper, we introduce HiFi-123, a method designed for high-fidelity and multi-view consistent 3D generation. Our contributions are twofold: First, we propose a Reference-Guided Novel View Enhancement (RGNV) technique that significantly improves the fidelity of diffusion-based zero-shot novel view synthesis methods. Second, capitalizing on the RGNV, we present a novel Reference-Guided State Distillation (RGSD) loss. When incorporated into the optimization-based image-to-3D pipeline, our method significantly improves 3D generation quality, achieving state-of-the-art performance. Comprehensive evaluations demonstrate the effectiveness of our approach over existing methods, both qualitatively and quantitatively. Video results are available on the project page.

引用

页码：258 / 274

页数：17

共 46 条

[1]

Balaji Y, 2022, arXiv

[2]

Burgess J, 2024, Arxiv, DOI arXiv:2309.07986

[3] MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing [J].

Cao, Mingdeng ;

Wang, Xintao ;

Qi, Zhongang ;

Shan, Ying ;

Qie, Xiaohu ;

Zheng, Yinqiang .

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :22503-22513

[4]

Chen R, 2023, Arxiv, DOI arXiv:2303.13873

[5]

Deepfloyd, About us

[6]

Deitke M, 2023, Arxiv, DOI arXiv:2307.05663

[7] NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General Image Priors [J].

Deng, Congyue ;

Jiang, Chiyu Max ;

Qi, Charles R. ;

Yan, Xinchen ;

Zhou, Yin ;

Guibas, Leonidas ;

Anguelov, Dragomir .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :20637-20647

[8]

Dhariwal P, 2021, ADV NEUR IN, V34

[9] Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items [J].

Downs, Laura ;

Francis, Anthony ;

Koenig, Nate ;

Kinman, Brandon ;

Hickman, Ryan ;

Reymann, Krista ;

McHugh, Thomas B. ;

Vanhoucke, Vincent .

2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, :2553-2560

[10]

Gal R, 2022, Arxiv, DOI [arXiv:2208.01618, 10.48550/arXiv.2208.01618]

← 1 2 3 4 5 →