Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

被引:1
|
作者
Gu, Yuchao [1 ]
Wang, Xintao [2 ]
Ge, Yixiao [2 ]
Shan, Ying [2 ]
Shou, Mike Zheng [1 ]
机构
[1] Natl Univ Singapore, Show Lab, Singapore, Singapore
[2] Tencent PCG, ARC Lab, Shenzhen, Peoples R China
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/CVPR52733.2024.00729
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vector-Quantized (VQ-based) generative models usually consist of two basic components, i.e., VQ tokenizers and generative transformers. Prior research focuses on improving the reconstruction fidelity of VQ tokenizers but rarely examines how the improvement in reconstruction affects the generation ability of generative transformers. In this paper, we surprisingly find that improving the reconstruction fidelity of VQ tokenizers does not necessarily improve the generation. Instead, learning to compress semantic features within VQ tokenizers significantly improves generative transformers' ability to capture textures and structures. We thus highlight two competing objectives of VQ tokenizers for image synthesis: semantic compression and details preservation. Different from previous work that prioritizes better details preservation, we propose Semantic-Quantized GAN (SeQ-GAN) with two learning phases to balance the two objectives. In the first phase, we propose a semantic-enhanced perceptual loss for better semantic compression. In the second phase, we fix the encoder and codebook, but enhance and finetune the decoder to achieve better details preservation. Our proposed SeQ-GAN significantly improves VQ-based generative models for both unconditional and conditional image generation. Specifically, SeQ-GAN achieves a Frechet Inception Distance (FID) of 6.25 and Inception Score (IS) of 140.9 on 256x256 ImageNet generation, which is a remarkable improvement over VIT-VQGAN (714M), which obtains 11.2 FID and 97.2 IS.
引用
收藏
页码:7631 / 7640
页数:10
相关论文
共 50 条
  • [1] VECTOR-QUANTIZED LATENT FLOWS FOR MEDICAL IMAGE SYNTHESIS AND OUT-OF-DISTRIBUTION DETECTION
    Khader, Firas
    Mueller-Franzes, Gustav
    Arasteh, Soroosh Tayebi
    Han, Tianyu
    Kather, Jakob Nikolas
    Stegmaier, Johannes
    Nebelung, Sven
    Truhn, Daniel
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [2] VQCNIR: Clearer Night Image Restoration with Vector-Quantized Codebook
    Zou, Wenbin
    Gao, Hongxia
    Ye, Tian
    Chen, Liang
    Yang, Weipeng
    Huang, Shasha
    Chen, Hongshen
    Chen, Sixiang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7873 - 7881
  • [3] Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling
    Zhang, Baoquan
    Wang, Huaibin
    Luo, Chuyao
    Li, Xutao
    Liang, Guotao
    Ye, Yunming
    Qi, Xiaochen
    He, Yao
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 7757 - 7766
  • [4] Vector-Quantized Autoregressive Predictive Coding
    Chung, Yu-An
    Tang, Hao
    Glass, James
    INTERSPEECH 2020, 2020, : 3760 - 3764
  • [5] LEARNING PRODUCT CODEBOOKS USING VECTOR-QUANTIZED AUTOENCODERS FOR IMAGE RETRIEVAL
    Wu, Hanwei
    Flierl, Markus
    2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
  • [6] Vector-Quantized Variational AutoEncoder for pansharpening
    Talbi, Farid
    Elmezouar, Miloud Chikr
    Boutellaa, Elhocine
    Alim, Fatiha
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (20) : 6329 - 6349
  • [7] Vector-Quantized Autoencoder With Copula for Collaborative Filtering
    Wang, Guanyu
    Zhong, Ting
    Xu, Xovee
    Zhang, Kunpeng
    Zhou, Fan
    Wang, Yong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3458 - 3462
  • [8] Vector-Quantized Prompt Learning for Paraphrase Generation
    Luo, Haotian
    Liu, Yixin
    Liu, Peidong
    Liut, Xianggen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 13389 - 13398
  • [9] Optimum design of vector-quantized subband codecs
    Jee, I
    Haddad, RA
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1998, 46 (08) : 2239 - 2243
  • [10] Optimum design of vector-quantized multiresolution codecs
    Jee, I
    Haddad, RA
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, PROCEEDINGS - VOL III, 1996, : 415 - 418