SiT: Exploring Flow and Diffusion-Based Generative Models with Scalable Interpolant Transformers

被引:1
|
作者
Ma, Nanye [1 ]
Goldstein, Mark [1 ]
Albergo, Michael S. [1 ]
Boffi, Nicholas M. [1 ]
Vanden-Eijnden, Eric [1 ]
Xie, Saining [1 ]
机构
[1] NYU, New York, NY 10016 USA
来源
关键词
D O I
10.1007/978-3-031-72980-5_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Scalable Interpolant Transformers (SiT), a family of generative models built on the backbone of Diffusion Transformers (DiT). The interpolant framework, which allows for connecting two distributions in a more flexible way than standard diffusion models, makes possible a modular study of various design choices impacting generative models built on dynamical transport: learning in discrete or continuous time, the objective function, the interpolant that connects the distributions, and deterministic or stochastic sampling. By carefully introducing the above ingredients, SiT surpasses DiT uniformly across model sizes on the conditional ImageNet 256 x 256 and 512 x 512 benchmark using the exact same model structure, number of parameters, and GFLOPs. By exploring various diffusion coefficients, which can be tuned separately from learning, SiT achieves an FID-50K score of 2.06 and 2.62, respectively. Code is available here: https://github.com/willisma/SiT.
引用
收藏
页码:23 / 40
页数:18
相关论文
共 50 条
  • [1] On Analyzing Generative and Denoising Capabilities of Diffusion-based Deep Generative Models
    Deja, Kamil
    Kuzina, Anna
    Trzcinski, Tomasz
    Tomczak, Jakub M.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Speech Enhancement and Dereverberation With Diffusion-Based Generative Models
    Richter, Julius
    Welker, Simon
    Lemercier, Jean-Marie
    Lay, Bunlong
    Gerkmann, Timo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2351 - 2364
  • [3] Elucidating the Design Space of Diffusion-Based Generative Models
    Karras, Tero
    Aittala, Miika
    Aila, Timo
    Laine, Samuli
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [4] Microstructure reconstruction using diffusion-based generative models
    Lee, Kang-Hyun
    Yun, Gun Jin
    MECHANICS OF ADVANCED MATERIALS AND STRUCTURES, 2024, 31 (18) : 4443 - 4461
  • [5] Scalable Diffusion Models with Transformers
    Peebles, William
    Xie, Saining
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4172 - 4182
  • [6] A Variational Perspective on Diffusion-Based Generative Models and Score Matching
    Huang, Chin-Wei
    Lim, Jae Hyun
    Courville, Aaron
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Diffusion-Based Graph Generative Methods
    Chen, Hongyang
    Xu, Can
    Zheng, Lingyu
    Zhang, Qiang
    Lin, Xuemin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 7954 - 7972
  • [8] Renormalization group flow, optimal transport, and diffusion-based generative model
    Sheshmani, Artan
    You, Yi-Zhuang
    Buyukates, Baturalp
    Ziashahabi, Amir
    Avestimehr, Salman
    PHYSICAL REVIEW E, 2025, 111 (01)
  • [9] Blind protein-ligand docking with diffusion-based deep generative models
    Corso, Gabriele
    Jing, Bowen
    Stark, Hannes
    Barzilay, Regina
    Jaakkola, Tommi
    BIOPHYSICAL JOURNAL, 2023, 122 (03) : 143A - 143A
  • [10] An optimal control perspective on diffusion-based generative modeling
    Berner, Julius
    Richter, Lorenz
    Ullrich, Karen
    arXiv, 2022,