High-Dimensional Stochastic Gradient Quantization for Communication-Efficient Edge Learning

被引：94

作者：

Du, Yuqing ^{[1
]}

Yang, Sheng ^{[2
]}

Huang, Kaibin ^{[1
]}

机构：

[1] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Peoples R China

[2] Univ Paris Saclay, Cent Supelec, Lab Signals & Syst, Gif Sur Yvette 91190, France

来源：

IEEE TRANSACTIONS ON SIGNAL PROCESSING | 2020年 / 68卷

关键词：

Vector quantization; gradient methods; learning (artificial intelligence); distributed algorithms; LIMITED FEEDBACK; MIMO; MANIFOLDS; DESIGN;

D O I：

10.1109/TSP.2020.2983166

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Edge machine learning involves the deployment of learning algorithms at the wireless network edge so as to leverage massive mobile data for enabling intelligent applications. The mainstream edge learning approach, federated learning, has been developed based on distributed gradient descent. Based on the approach, stochastic gradients are computed at edge devices and then transmitted to an edge server for updating a global AI model. Since each stochastic gradient is typically high-dimensional, communication overhead becomes a bottleneck for edge learning. To address this issue, we propose a novel framework of hierarchical gradient quantization and study its effect on the learning performance. First, the framework features a practical hierarchical architecture for decomposing the stochastic gradient into its norm and normalized block gradients, and efficiently quantizes them using a uniform quantizer and a low-dimensional Grassmannian codebook, respectively. Subsequently, the quantized normalized block gradients are scaled and cascaded to yield the quantized normalized stochastic gradient using a socalled hinge vector, which is compressed using another low-dimensional Grassmannian quantizer designed under the criterion of minimum distortion. The other feature of the framework is a bit-allocation scheme for reducing the distortion, which divides the total quantization bits to determine the resolutions of low-dimensional quantizers. The framework is proved to guarantee model convergency by analyzing the convergence rate as a function of quantization bits. Furthermore, by simulation, our design is shown to substantially reduce the communication overhead compared with the state-of-the-art signSGD scheme, while achieving similar learning accuracies.

引用

页码：2128 / 2142

页数：15

共 31 条

[1]

Alistarh D, 2017, ADV NEURAL INF PROCE, P1709

[2]

Amiri MM, 2019, IEEE INT SYMP INFO, P1432, DOI [10.1109/tsp.2020.2981904, 10.1109/ISIT.2019.8849334]

[3]

[Anonymous], 2019, COMMUNICATION EFFICI

[4]

[Anonymous], 2012, VECTOR QUANTIZATION

[5]

[Anonymous], 2012, PROC 46 ANN C INF SC

[6] DISTRIBUTIONAL IDENTITIES OF BETA-SQUARED AND CHI-SQUARED VARIATES - A GEOMETRICAL INTERPRETATION [J].

BAILEY, RW .

AMERICAN STATISTICIAN, 1992, 46 (02) :117-120

[7]

Bernstein J, 2018, PR MACH LEARN RES, V80

[8] Noncoherent Trellis Coded Quantization: A Practical Limited Feedback Technique for Massive MIMO Systems [J].

Choi, Junil ;

Chance, Zachary ;

Love, David J. ;

Madhow, Upamanyu .

IEEE TRANSACTIONS ON COMMUNICATIONS, 2013, 61 (12) :5016-5029

[9]

Conway John Horton, 2013, Sphere packings, lattices and groups, V290

[10] Quantization bounds on Grassmann manifolds and applications to MIMO communications [J].

Dai, Wei ;

Liu, Youjian Eugene ;

Rider, Brian .

IEEE TRANSACTIONS ON INFORMATION THEORY, 2008, 54 (03) :1108-1123

← 1 2 3 4 →