Fast and scalable all-optical network architecture for distributed deep learning

被引:0
|
作者
Li, Wenzhe [1 ]
Yuan, Guojun [1 ]
Wang, Zhan [1 ]
Tan, Guangming [1 ]
Zhang, Peiheng [1 ,2 ]
Rouskas, George N. [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, 6 Kexueyuan South Rd Zhongguancun, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Intelligent Comp Technol, 88 Jinji Lake Ave,Ind Pk, Suzhou, Peoples R China
[3] North Carolina State Univ, Dept Comp Sci, 890 Oval Dr, Raleigh, NC 27695 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
PERFORMANCE; OPERATIONS;
D O I
10.1364/JOCN.511696
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the ever-increasing size of training models and datasets, network communication has emerged as a major bottleneck in distributed deep learning training. To address this challenge, we propose an optical distributed deep learning (ODDL) architecture. ODDL utilizes a fast yet scalable all-optical network architecture to accelerate distributed training. One of the key features of the architecture is its flow-based transmit scheduling with fast reconfiguration. This allows ODDL to allocate dedicated optical paths for each traffic stream dynamically, resulting in low network latency and high network utilization. Additionally, ODDL provides physically isolated and tailored network resources for training tasks by reconfiguring the optical switch using LCoS-WSS technology. The ODDL topology also uses tunable transceivers to adapt to time-varying traffic patterns. To achieve accurate and fine-grained scheduling of optical circuits, we propose an efficient distributed control scheme that incurs minimal delay overhead. Our evaluation on real-world traces showcases ODDL's remarkable performance. When implemented with 1024 nodes and 100 Gbps bandwidth, ODDL accelerates VGG19 training by 1.6x and 1.7x compared to conventional fat-tree electrical networks and photonic SiP-Ring architectures, respectively. We further build a four-node testbed, and our experiments show that ODDL can achieve comparable training time compared to that of an ideal electrical switching network. (c) 2024 Optica Publishing Group
引用
收藏
页码:342 / 357
页数:16
相关论文
共 50 条
  • [1] An all-optical network architecture
    Gipser, T
    Kao, MS
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 1996, 14 (05) : 693 - 702
  • [2] Ultralow loss, fast all-optical scalable switches
    Balauroiu, Mircea
    Ruf, Fabian
    Volet, Nicolas
    Heck, Martijn J. R.
    2021 ANNUAL CONFERENCE OF THE IEEE PHOTONICS SOCIETY (IPC), 2021,
  • [3] Path reconfigurations according to the theory of latin squares in a scalable all-optical network architecture
    Gipser, T
    ATM, NETWORKS AND LANS - NOC '96-II, 1996, : 299 - 301
  • [4] Accelerate Distributed Deep Learning with a Fast Reconfigurable Optical Network
    Li, Wenzhe
    Yuan, Guojun
    Wang, Zhan
    Tan, Guangming
    Zhang, Peiheng
    Rouskas, George N.
    2024 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXHIBITION, OFC, 2024,
  • [5] Cognitive All-Optical Fiber Network Architecture
    Chan, Vincent W. S.
    Jang, Esther
    2017 19TH INTERNATIONAL CONFERENCE ON TRANSPARENT OPTICAL NETWORKS (ICTON), 2017,
  • [6] Low Complexity All-Optical Network Coder Architecture
    Manley, Eric D.
    2014 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2014, : 1046 - 1050
  • [7] Demonstration of fast restorable all-optical WDM network
    Kim, JK
    Ji, HC
    Chung, HS
    Kim, CH
    Shin, SK
    Hyun, DH
    Chung, YC
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2001, E84B (05) : 1119 - 1126
  • [8] Demonstration of fast restorable all-optical WDM network
    Kim, JK
    Ji, HC
    Chung, HS
    Kim, CH
    Shin, SK
    Hyun, DH
    Chung, YC
    IEICE TRANSACTIONS ON ELECTRONICS, 2001, E84C (05) : 493 - 500
  • [9] A SCALABLE MULTIWAVELENGTH MULTIHOP OPTICAL NETWORK - A PROPOSAL FOR RESEARCH ON ALL-OPTICAL NETWORKS
    BRACKETT, CA
    ACAMPORA, AS
    SWEITZER, J
    TANGONAN, G
    SMITH, MT
    LENNON, W
    WANG, KC
    HOBBS, RH
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 1993, 11 (5-6) : 736 - 753
  • [10] Daisy: a scalable all-optical packet network with multifiber ring topology
    Marsan, MA
    Fumagalli, A
    Leonardi, E
    Neri, F
    Poggiolini, P
    COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (11): : 1065 - 1082