Fast and scalable all-optical network architecture for distributed deep learning

被引:0
|
作者
Li, Wenzhe [1 ]
Yuan, Guojun [1 ]
Wang, Zhan [1 ]
Tan, Guangming [1 ]
Zhang, Peiheng [1 ,2 ]
Rouskas, George N. [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, 6 Kexueyuan South Rd Zhongguancun, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Intelligent Comp Technol, 88 Jinji Lake Ave,Ind Pk, Suzhou, Peoples R China
[3] North Carolina State Univ, Dept Comp Sci, 890 Oval Dr, Raleigh, NC 27695 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
PERFORMANCE; OPERATIONS;
D O I
10.1364/JOCN.511696
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the ever-increasing size of training models and datasets, network communication has emerged as a major bottleneck in distributed deep learning training. To address this challenge, we propose an optical distributed deep learning (ODDL) architecture. ODDL utilizes a fast yet scalable all-optical network architecture to accelerate distributed training. One of the key features of the architecture is its flow-based transmit scheduling with fast reconfiguration. This allows ODDL to allocate dedicated optical paths for each traffic stream dynamically, resulting in low network latency and high network utilization. Additionally, ODDL provides physically isolated and tailored network resources for training tasks by reconfiguring the optical switch using LCoS-WSS technology. The ODDL topology also uses tunable transceivers to adapt to time-varying traffic patterns. To achieve accurate and fine-grained scheduling of optical circuits, we propose an efficient distributed control scheme that incurs minimal delay overhead. Our evaluation on real-world traces showcases ODDL's remarkable performance. When implemented with 1024 nodes and 100 Gbps bandwidth, ODDL accelerates VGG19 training by 1.6x and 1.7x compared to conventional fat-tree electrical networks and photonic SiP-Ring architectures, respectively. We further build a four-node testbed, and our experiments show that ODDL can achieve comparable training time compared to that of an ideal electrical switching network. (c) 2024 Optica Publishing Group
引用
收藏
页码:342 / 357
页数:16
相关论文
共 50 条
  • [31] Reliable and cost effective all-optical wireless architecture for a broadband access network
    Singh, Priyanka
    Bohara, Vivek Ashok
    Srivastava, Anand
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2023, 15 (02) : 98 - 110
  • [32] Novel Distributed All-optical Multicast WDM Fiber Network: Design and Implementation
    Lu, Dan
    Qin, Xi
    Zhang, Feng
    Lv, Bo
    Chen, Ming
    Jian, Shui-sheng
    AOE 2007: ASIA OPTICAL FIBER COMMUNICATION & OPTOELECTRONIC EXPOSITION & CONFERENCE, CONFERENCE PROCEEDINGS, 2008, : 321 - +
  • [33] Scalable and Dynamic Optical Network Architecture
    Chan, Vincent W. S.
    2016 21ST OPTOELECTRONICS AND COMMUNICATIONS CONFERENCE (OECC) HELD JOINTLY WITH 2016 INTERNATIONAL CONFERENCE ON PHOTONICS IN SWITCHING (PS), 2016,
  • [34] A scalable hierarchical architecture for distributed network management
    Wang, P
    Li, XM
    Zhao, H
    2001 INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND MOBILE COMPUTING, PROCEEDINGS, 2001, : 21 - 26
  • [35] Scalable architectures for all-optical label swapping nodes
    Ruth Van Caenegem
    Didier Colle
    Mario Pickavet
    Piet Demeester
    Photonic Network Communications, 2009, 17 : 75 - 91
  • [36] Scalable all-optical cold damping of levitated nanoparticles
    Vijayan, Jayadev
    Zhang, Zhao
    Piotrowski, Johannes
    Windey, Dominik
    van der Laan, Fons
    Frimmer, Martin
    Novotny, Lukas
    NATURE NANOTECHNOLOGY, 2023, 18 (01) : 49 - +
  • [37] Reconfigurable and Scalable All-Optical VPN in WDM PON
    Hu, Xiaofeng
    Zhang, Liang
    Cao, Pan
    Zhou, Gan
    Li, Fei
    Su, Yikai
    IEEE PHOTONICS TECHNOLOGY LETTERS, 2011, 23 (14) : 941 - 943
  • [38] Scalable all-optical cold damping of levitated nanoparticles
    Jayadev Vijayan
    Zhao Zhang
    Johannes Piotrowski
    Dominik Windey
    Fons van der Laan
    Martin Frimmer
    Lukas Novotny
    Nature Nanotechnology, 2023, 18 : 49 - 54
  • [39] Scalable architectures for all-optical label swapping nodes
    Van Caenegem, Ruth
    Colle, Didier
    Pickavet, Mario
    Demeester, Piet
    PHOTONIC NETWORK COMMUNICATIONS, 2009, 17 (01) : 75 - 91
  • [40] A Scalable Distributed Memory Architecture for Network on Chip
    Zhang Yuang
    Li Li
    Yang Shengguang
    Dong Lan
    Lou Xiaoxiang
    Gao Minglun
    2008 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2008), VOLS 1-4, 2008, : 1260 - 1263