Resizing codebook of vector quantization without retraining

被引:0
作者
Lei Li
Tingting Liu
Chengyu Wang
Minghui Qiu
Cen Chen
Ming Gao
Aoying Zhou
机构
[1] East China Normal University,Shanghai Engineering Research Center of Big Data Management, School of Data Science and Engineering
[2] East China Normal University,KLATASDS
[3] Alibaba Group,MOE, School of Statistics
来源
Multimedia Systems | 2023年 / 29卷
关键词
Codebook resizing; Vector quantization; Hyperbolic embeddings; Hilbert curve;
D O I
暂无
中图分类号
学科分类号
摘要
Large models pre-trained on massive data have become a flourishing paradigm of artificial intelligence systems. Recent works, such as M6, CogView, WenLan 2.0, NÜWA, and ERNIE-ViLG, further extend this diagram to joint Vision Language Pre-training (VLP). For VLP, the two-stage architecture is a popular design, which includes the first stage learning an encoding function of data and the second stage learning a probabilistic model of encoded representation of data. Vector quantization (VQ) has usually engaged in the encoding function of image data for the first stage. VQ includes a data structure (codebook) and an algorithm (finding nearest quantization). The publicly available VQ models (e.g., VQGAN, VQVAE, VQVAE2) include a codebook whose size is assigned empirically (e.g., 1024, 4096, and 16,384) by their authors. If we want a smaller codebook for a lower computation load of the VQ process, or we want a larger codebook for better reconstruction quality, we have to retrain VQ models that consist of the down-sampling net, the codebook, and the up-sampling net. However, retraining VQ models is very expensive since these models, with billions of parameters, are trained on massive datasets. It motivates us to find an approach to resize the codebook of Vector quantization without retraining. In this paper, we leverage hyperbolic embeddings to enhance codebook vectors with the co-occurrence information and reorder the enhanced codebook by the Hilbert curve. Then we can resize the codebook of vector quantization for lower computation load or better reconstruction quality. Experimental results prove the efficiency and effectiveness of our approach when compared with competitive baselines. The code will be released to the public.
引用
收藏
页码:1499 / 1512
页数:13
相关论文
共 14 条
[1]  
Ai L(2017)Optimized residual vector quantization for efficient approximate nearest neighbor search Multimed. Syst. 23 169-181
[2]  
Yu J(2011)All-nearest-neighbors finding based on the Hilbert curve Expert Syst. Appl. 38 7462-7475
[3]  
Wu Z(2019)Fast approximate nearest neighbor search with the navigating spreading-out graph Proc. VLDB Endow. 12 461-474
[4]  
Chen H(2020)Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs IEEE Trans. Pattern Anal. Mach. Intell. 42 824-836
[5]  
Chang Y(2021)PSNR vs SSIM: imperceptibility quality assessment for image steganography Multimed. Tools Appl. 80 8423-8444
[6]  
Fu C(2004)Image quality assessment: from error visibility to structural similarity IEEE Trans. Image Process. 13 600-612
[7]  
Xiang C(undefined)undefined undefined undefined undefined-undefined
[8]  
Wang C(undefined)undefined undefined undefined undefined-undefined
[9]  
Malkov YA(undefined)undefined undefined undefined undefined-undefined
[10]  
Yashunin DA(undefined)undefined undefined undefined undefined-undefined