Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

被引:0
作者
Wang, Zhendong [1 ,2 ]
Jiang, Yifan [1 ]
Zheng, Huangjie [1 ,2 ]
Wang, Peihao [1 ]
He, Pengcheng [2 ]
Wang, Zhangyang [1 ]
Chen, Weizhu [2 ]
Zhou, Mingyuan [1 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Microsoft Azure AI, Austin, TX 78759 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve >= 2x faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, e.g., as few as 5,000 images to train from scratch. We achieve outstanding FID scores in line with state-of-the-art benchmarks: 1.77 on CelebA-64x64, 1.93 on AFHQv2-Wild-64x64, and 2.72 on ImageNet-256x256. We share our code and pre-trained models at https://github.com/Zhendong- Wang/Patch- Diffusion.
引用
收藏
页数:18
相关论文
共 50 条
[31]   Post-training Quantization on Diffusion Models [J].
Shang, Yuzhang ;
Yuan, Zhihang ;
Xie, Bin ;
Wu, Bingzhe ;
Yan, Yan .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :1972-1981
[32]   Analyzing and Improving the Training Dynamics of Diffusion Models [J].
Karras, Tero ;
Aittala, Miika ;
Lehtinen, Jaakko ;
Hellsten, Janne ;
Aila, Timo ;
Laine, Samuli .
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, :24174-24184
[33]   QCore: Data-Efficient, On-Device Continual Calibration for Quantized Models [J].
Campos, David ;
Yang, Bin ;
Kieu, Tung ;
Zhang, Miao ;
Guo, Chenjuan ;
Jensen, Christian S. .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (11) :2708-2721
[34]   Diffusion Models as Data Mining Tools [J].
Siglidis, Ioannis ;
Holynski, Aleksander ;
Efros, Alexei A. ;
Aubry, Mathieu ;
Ginosar, Shiry .
COMPUTER VISION - ECCV 2024, PT LXI, 2025, 15119 :393-409
[35]   Synthetic data generation by diffusion models [J].
Zhu, Jun .
NATIONAL SCIENCE REVIEW, 2024, 11 (08)
[36]   Synthetic data generation by diffusion models [J].
Jun Zhu .
National Science Review, 2024, 11 (08) :19-21
[37]   Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models [J].
Somepalli, Gowthami ;
Singla, Vasu ;
Goldblum, Micah ;
Geiping, Jonas ;
Goldstein, Tom .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :6048-6058
[38]   Latent-Variable Generative Models for Data-Efficient Text Classification [J].
Ding, Xiaoan ;
Gimpel, Kevin .
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, :507-517
[39]   VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning [J].
Chen, Jun ;
Guo, Han ;
Yi, Kai ;
Li, Boyang ;
Elhoseiny, Mohamed .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :18009-18019
[40]   Less is more: Data-efficient complex question answering over knowledge bases [J].
Hua, Yuncheng ;
Li, Yuan-Fang ;
Qi, Guilin ;
Wu, Wei ;
Zhang, Jingyao ;
Qi, Daiqing .
JOURNAL OF WEB SEMANTICS, 2020, 65