Latent Weight Quantization for Integerized Training of Deep Neural Networks

被引:0
|
作者
Fei, Wen [1 ]
Dai, Wenrui [2 ]
Zhang, Liang [3 ]
Zhang, Luoming [4 ]
Li, Chenglin [1 ]
Zou, Junni [2 ]
Xiong, Hongkai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[3] Donghua Univ, Sch Comp Sci & Technol, Shanghai 201620, Peoples R China
[4] Zhejiang Univ, Key Lab Biomed Engn, Minist Educ, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金;
关键词
Quantization (signal); Training; Perturbation methods; Memory management; Hardware; Trajectory; Random access memory; Graphics processing units; Computational modeling; Noise; Integerized training; deep neural network quantization; latent weight; dual quantizer; large language models;
D O I
10.1109/TPAMI.2025.3527498
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing methods for integerized training speed up deep learning by using low-bitwidth integerized weights, activations, gradients, and optimizer buffers. However, they overlook the issue of full-precision latent weights, which consume excessive memory to accumulate gradient-based updates for optimizing the integerized weights. In this paper, we propose the first latent weight quantization schema for general integerized training, which minimizes quantization perturbation to training process via residual quantization with optimized dual quantizer. We leverage residual quantization to eliminate the correlation between latent weight and integerized weight for suppressing quantization noise. We further propose dual quantizer with optimal nonuniform codebook to avoid frozen weight and ensure statistically unbiased training trajectory as full-precision latent weight. The codebook is optimized to minimize the disturbance on weight update under importance guidance and achieved with a three-segment polyline approximation for hardware-friendly implementation. Extensive experiments show that the proposed schema allows integerized training with lowest 4-bit latent weight for various architectures including ResNets, MobileNetV2, and Transformers, and yields negligible performance loss in image classification and text generation. Furthermore, we successfully fine-tune Large Language Models with up to 13 billion parameters on one single GPU using the proposed schema.
引用
收藏
页码:2816 / 2832
页数:17
相关论文
共 50 条
  • [21] Privacy-Preserving Computation Offloading for Parallel Deep Neural Networks Training
    Mao, Yunlong
    Hong, Wenbo
    Wang, Heng
    Li, Qun
    Zhong, Sheng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (07) : 1777 - 1788
  • [22] Local Critic Training for Model-Parallel Learning of Deep Neural Networks
    Lee, Hojung
    Hsieh, Cho-Jui
    Lee, Jong-Seok
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (09) : 4424 - 4436
  • [23] Dependency-Aware Tensor Scheduler for Industrial AI Applications: Dymem-An Aggressive Data-Swapping Policy for Training Nonlinear Deep Neural Networks
    Rang, Wei
    Yang, Donglin
    Cheng, Dazhao
    IEEE INDUSTRIAL ELECTRONICS MAGAZINE, 2022, 16 (02) : 15 - 23
  • [24] TRAINING DEEP NEURAL-NETWORKS BASED ON UNRELIABLE LABELS
    Bekker, Alan Joseph
    Goldberger, Jacob
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2682 - 2686
  • [25] Hierarchical Training of Deep Neural Networks Using Early Exiting
    Sepehri, Yamin
    Pad, Pedram
    Yuzuguler, Ahmet Caner
    Frossard, Pascal
    Dunbar, L. Andrea
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
  • [26] TRIM: A Design Space Exploration Model for Deep Neural Networks Inference and Training Accelerators
    Qi, Yangjie
    Zhang, Shuo
    Taha, Tarek M.
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (05) : 1648 - 1661
  • [27] Training-Free Deep Generative Networks for Compressed Sensing of Neural Action Potentials
    Sun, Biao
    Mu, Chaoxu
    Wu, Zexu
    Zhu, Xinshan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) : 5190 - 5199
  • [28] DCBT-Net: Training Deep Convolutional Neural Networks With Extremely Noisy Labels
    Olimov, Bekhzod
    Kim, Jeonghong
    Paul, Anand
    IEEE ACCESS, 2020, 8 : 220482 - 220495
  • [29] Neural Networks Integer Computation: Quantizing Convolutional Neural Networks of Inference and Training for Object Detection in Embedded Systems
    Xiao, Penghao
    Zhang, Chunjie
    Guo, Qian
    Xiao, Xiayang
    Wang, Haipeng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 15862 - 15884
  • [30] Residual Quantization for Low Bit-Width Neural Networks
    Li, Zefan
    Ni, Bingbing
    Yang, Xiaokang
    Zhang, Wenjun
    Gao, Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 214 - 227