Spectral Tensor Layers for Communication-Free Distributed Deep Learning

被引:0
作者
Liu, Xiao-Yang [1 ,2 ]
Wang, Xiaodong [1 ]
Yuan, Bo [3 ]
Han, Jiashu [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
[2] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
[3] Rutgers State Univ, Dept Elect & Comp Engn, Piscataway, NJ 08854 USA
关键词
Communication-free; deep learning; federated learning (FL); linear transform; multiresolution heterogeneous data; spectral tensor layer; tensor; STRATEGIES;
D O I
10.1109/TNNLS.2024.3394861
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, we propose a novel spectral tensor layer for communication-free distributed deep learning. The overall framework is as follows: first, we represent the data in tensor form (instead of vector form) and replace the matrix product in conventional neural networks with the tensor product, which in effect imposes certain transformed-induced structure on the original weight matrices, e.g., a block-circulant structure; then, we apply a linear transform along a certain dimension to split the original dataset into multiple spectral subdatasets; as a result, the proposed spectral tensor network consists of parallel branches where each branch is a conventional neural network trained on a spectral subdataset with ZERO communication cost. The parallel branches are directly ensembled (i.e., the weighted sum of their outputs) to generate an overall network with substantially stronger generalization capability than that of each branch. Moreover, the proposed method enjoys a byproduct of decentralization gain in terms of memory and computation, compared with traditional networks. It is a natural yet elegant solution for heterogeneous data in federated learning (FL), where data at different nodes have different resolutions. Finally, we evaluate the proposed spectral tensor networks on the MNIST, CIFAR-10, ImageNet-1K, and ImageNet-21K datasets, respectively, to verify that they simultaneously achieve communication-free distributed learning, distributed storage reduction, parallel computation speedup, and learning with multiresolution data.
引用
收藏
页码:7237 / 7251
页数:15
相关论文
共 48 条
[31]  
Liu XY, 2017, Arxiv, DOI arXiv:1705.01576
[32]   AutoMix: Unveiling the Power of Mixup for Stronger Classifiers [J].
Liu, Zicheng ;
Li, Siyuan ;
Wu, Di ;
Liu, Zihan ;
Chen, Zhiyuan ;
Wu, Lirong ;
Li, Stan Z. .
COMPUTER VISION, ECCV 2022, PT XXIV, 2022, 13684 :441-458
[33]  
McMahan HB, 2017, PR MACH LEARN RES, V54, P1273
[34]  
Moczulski M., 2015, P ICLR, V55, P1
[35]  
Newman E, 2018, Arxiv, DOI arXiv:1811.06569
[36]  
Novikov A, 2015, ADV NEUR IN, V28
[37]  
Paszke A, 2019, ADV NEUR IN, V32
[38]  
Prabhu A., 2020, P IEEE CVF C COMP VI, P12024
[39]  
Sainath TN, 2013, INT CONF ACOUST SPEE, P6655, DOI 10.1109/ICASSP.2013.6638949
[40]  
Sindhwani V, 2015, ADV NEUR IN, V28