Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation

被引：7

作者：

Wu, Zijie ^{[1
,3
]}

Wang, Yaonan ^{[1
]}

Feng, Mingtao ^{[2
]}

Xie, He ^{[1
]}

Mian, Ajmal ^{[3
]}

机构：

[1] Hunan Univ, Changsha, Peoples R China

[2] Xidian Univ, Xian, Peoples R China

[3] Univ Western Australia, Nedlands, WA 6009, Australia

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年

基金：

澳大利亚研究理事会;

关键词：

D O I：

10.1109/ICCV51070.2023.00820

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Diffusion probabilistic models have achieved remarkable success in text guided image generation. However, generating 3D shapes is still challenging due to the lack of sufficient data containing 3D models along with their descriptions. Moreover, text based descriptions of 3D shapes are inherently ambiguous and lack details. In this paper, we propose a sketch and text guided probabilistic diffusion model for colored point cloud generation that conditions the denoising process jointly with a hand drawn sketch of the object and its textual description. We incrementally diffuse the point coordinates and color values in a joint diffusion process to reach a Gaussian distribution. Colored point cloud generation thus amounts to learning the reverse diffusion process, conditioned by the sketch and text, to iteratively recover the desired shape and color. Specifically, to learn effective sketch-text embedding, our model adaptively aggregates the joint embedding of text prompt and the sketch based on a capsule attention network. Our model uses staged diffusion to generate the shape and then assign colors to different parts conditioned on the appearance prompt while preserving precise shapes from the first stage. This gives our model the flexibility to extend to multiple tasks, such as appearance re-editing and part segmentation. Experimental results demonstrate that our model outperforms recent state-of-the-art in point cloud generation.

引用

页码：8895 / 8905

页数：11

共 66 条

[1]

[Anonymous], AS C COMP VIS

[2] State-of-the-Art in the Architecture, Methods and Applications of StyleGAN [J].

Bermano, A. H. ;

Gal, R. ;

Alaluf, Y. ;

Mokady, R. ;

Nitzan, Y. ;

Tov, O. ;

Patashnik, O. ;

Cohen-Or, D. .

COMPUTER GRAPHICS FORUM, 2022, 41 (02) :591-611

[3] Learning Gradient Fields for Shape Generation [J].

Cai, Ruojin ;

Yang, Guandao ;

Averbuch-Elor, Hadar ;

Hao, Zekun ;

Belongie, Serge ;

Snavely, Noah ;

Hariharan, Bharath .

COMPUTER VISION - ECCV 2020, PT III, 2020, 12348 :364-381

[4]

Chang Angel X., 2015, arXiv

[5] Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings [J].

Chen, Kevin ;

Choy, Christopher B. ;

Savva, Manolis ;

Chang, Angel X. ;

Funkhouser, Thomas ;

Savarese, Silvio .

COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 :100-116

[6] Adaptively-Realistic Image Generation from Stroke and Sketch with Diffusion Model [J].

Cheng, Shin-I ;

Chen, Yu-Jie ;

Chiu, Wei-Chen ;

Tseng, Hung-Yu ;

Lee, Hsin-Ying .

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :4043-4051

[7] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].

Dai, Angela ;

Qi, Charles Ruizhongtai ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554

[8]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[9]

Dhariwal P, 2021, ADV NEUR IN, V34

[10]

Dinh L., 2016, 5 INT C LEARN REPR I

← 1 2 3 4 5 6 7 →