Fast and scalable all-optical network architecture for distributed deep learning

被引:0
|
作者
Li, Wenzhe [1 ]
Yuan, Guojun [1 ]
Wang, Zhan [1 ]
Tan, Guangming [1 ]
Zhang, Peiheng [1 ,2 ]
Rouskas, George N. [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, 6 Kexueyuan South Rd Zhongguancun, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Intelligent Comp Technol, 88 Jinji Lake Ave,Ind Pk, Suzhou, Peoples R China
[3] North Carolina State Univ, Dept Comp Sci, 890 Oval Dr, Raleigh, NC 27695 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
PERFORMANCE; OPERATIONS;
D O I
10.1364/JOCN.511696
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the ever-increasing size of training models and datasets, network communication has emerged as a major bottleneck in distributed deep learning training. To address this challenge, we propose an optical distributed deep learning (ODDL) architecture. ODDL utilizes a fast yet scalable all-optical network architecture to accelerate distributed training. One of the key features of the architecture is its flow-based transmit scheduling with fast reconfiguration. This allows ODDL to allocate dedicated optical paths for each traffic stream dynamically, resulting in low network latency and high network utilization. Additionally, ODDL provides physically isolated and tailored network resources for training tasks by reconfiguring the optical switch using LCoS-WSS technology. The ODDL topology also uses tunable transceivers to adapt to time-varying traffic patterns. To achieve accurate and fine-grained scheduling of optical circuits, we propose an efficient distributed control scheme that incurs minimal delay overhead. Our evaluation on real-world traces showcases ODDL's remarkable performance. When implemented with 1024 nodes and 100 Gbps bandwidth, ODDL accelerates VGG19 training by 1.6x and 1.7x compared to conventional fat-tree electrical networks and photonic SiP-Ring architectures, respectively. We further build a four-node testbed, and our experiments show that ODDL can achieve comparable training time compared to that of an ideal electrical switching network. (c) 2024 Optica Publishing Group
引用
收藏
页码:342 / 357
页数:16
相关论文
共 50 条
  • [41] All-Optical Scalable Spatial Coherent Ising Machine
    Strinati, Marcello Calvanese
    Pierangeli, Davide
    Conti, Claudio
    PHYSICAL REVIEW APPLIED, 2021, 16 (05)
  • [42] Fast tunable filter enables packet-switching on all-optical network
    Taranenko, NL
    Tenbrink, SC
    Katsman, V
    Hsu, K
    INTEGRATED OPTICS DEVICES III, 1999, 3620 : 57 - 65
  • [43] All-optical packet routing - Architecture and implementation
    Choa, FS
    Chao, HJ
    PHOTONIC NETWORK COMMUNICATIONS, 1999, 1 (04) : 303 - 311
  • [44] Efficient distributed control routing and wavelength assignment mechanism for a scalable hierarchical single-hop WDM all-optical interconnection network
    Okorafor, E
    Lu, M
    PROCEEDINGS OF THE 6TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2002, : 276 - 282
  • [45] All-optical packet switching - architecture and implementation
    Choa, FS
    OPTOELECTRONIC INTEGRATED CIRCUITS IV, 2000, 3950 : 129 - 139
  • [46] Transmission limitations in the all-optical network
    Gillner, L
    22ND EUROPEAN CONFERENCE ON OPTICAL COMMUNICATIONS, PROCEEDINGS, VOLS 1-6: CO-LOCATED WITH: 2ND EUROPEAN EXHIBITION ON OPTICAL COMMUNICATION - EEOC '96, 1996, : B39 - B44
  • [47] AN ALL-OPTICAL MULTIFIBER TREE NETWORK
    BANNISTER, J
    GERLA, M
    KOVACEVIC, M
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 1993, 11 (5-6) : 997 - 1008
  • [48] Management in the WDM all-optical network
    Tomic, S
    ALL-OPTICAL COMMUNICATION SYSTEMS: ARCHITECTURE, CONTROL, AND NETWORK ISSUES III, 1997, 3230 : 114 - 124
  • [49] Realising the vision of an all-optical network
    Al-Chalabi, S
    Stewart, H
    JOURNAL OF THE COMMUNICATIONS NETWORK, 2003, 2 : 174 - 178
  • [50] All-optical stage of an Omega network
    Marom, DM
    Konforti, N
    Mendlovic, D
    APPLIED OPTICS, 1998, 37 (29) : 6946 - 6950