Resource Allocation in Flexible-Bandwidth Fine-Grained Optical Transport Networks for Geo-Distributed Machine Learning

被引:1
作者
Lian, Meng [1 ]
Zhao, Yongli [1 ]
Li, Xin [1 ]
Liu, Wenhong [1 ]
Li, Yajie [1 ]
Tornatore, Massimo [2 ]
Zhang, Jie [1 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Informat Photon & Opt Commun, Beijing 100876, Peoples R China
[2] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, I-20133 Milan, Italy
基金
中国国家自然科学基金;
关键词
Bandwidth; Training; Resource management; Internet of Things; Synchronization; Heuristic algorithms; Protection; Optical network units; Biomedical optical imaging; Optical fibers; Fine-grained optical transport network (fgOTN); geo-distributed machine leaning (GDML); network reconfiguration; optical network; resource allocation; COMMUNICATION-EFFICIENT; INTERNET; EDGE; SERVER; MODEL;
D O I
10.1109/JIOT.2025.3558933
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Geo-distributed machine learning (GDML) can facilitate collaborative learning among geographically dispersed data centers to meet the demands of distributed and privacy-preserving training for large-scale distributed Internet of Things applications. Unfortunately, the efficiency of distributed training tasks heavily depends on synchronized communication between multiple distributed models over bandwidth-limited wide area networks (WANs). The fine-grained optical transport network (fgOTN), thanks to its adjustable bandwidth connections, represents more flexible transmission and has the ability for accurate synchronization across GDML tasks in WANs. However, flexible bandwidth assignment and complex interdependencies among tasks pose significant challenges to resource allocation for GDML in fgOTN. Specifically, flexible bandwidth assignment exacerbates resource competition among task flows, leading to decreased learning efficiency. This article provides novel resource allocation solutions for GDML in fgOTN. We first formulate this problem as a linear programming aimed at maximizing the completion ratio of GDML tasks. Subsequently, we propose an innovative resource allocation algorithm based on genetic algorithm (GARA) for GDML in fgOTN. GARA considers both task completion and bandwidth adjustment through population generation based on prior knowledge and adaptive mutation based on completion ratio. Simulation analysis demonstrates that GARA effectively prioritizes resource allocation for high-priority tasks to alleviate resource competition, achieving the highest task completion ratio while avoiding excessive network reconfiguration.
引用
收藏
页码:25601 / 25619
页数:19
相关论文
共 50 条
[1]   Minimizing Cost of Hierarchical OTN Traffic Grooming Boards in Mesh Networks [J].
Attarpour, Aryanaz ;
Ibrahimi, Memedhe ;
Musumeci, Francesco ;
Castoldi, Andrea ;
Ragni, Mario ;
Tornatore, Massimo .
2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, :3700-3705
[2]  
Boyd S., 2007, SUBGRADIENT METHODS
[3]  
Brown TB, 2020, ADV NEUR IN, V33
[4]  
Che-Yu Liu, 2022, IEEE/OSA Journal of Optical Communications and Networking, V14, pA113, DOI [10.1364/jocn.439801, 10.1364/JOCN.439801]
[5]   Dynamic Programmable Optical Transceiver Configuration Based on Digital Twin [J].
Cui, Siheng ;
Wang, Danshi ;
Li, Jin ;
Zhang, Min .
IEEE COMMUNICATIONS LETTERS, 2021, 25 (01) :205-208
[6]   On Deep Reinforcement Learning for Static Routing and Wavelength Assignment [J].
Di Cicco, Nicola ;
Mercan, Emre Furkan ;
Karandin, Oleg ;
Ayoub, Omran ;
Troia, Sebastian ;
Musumeci, Francesco ;
Tornatore, Massimo .
IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, 2022, 28 (04)
[7]  
Du Chunhui, 2023, ICCAI '23: Proceedings of the 2023 9th International Conference on Computing and Artificial Intelligence, P720, DOI 10.1145/3594315.3594397
[8]   Approximate to Be Great: Communication Efficient and Privacy-Preserving Large-Scale Distributed Deep Learning in Internet of Things [J].
Du, Wei ;
Li, Ang ;
Zhou, Pan ;
Xu, Zichuan ;
Wang, Xiumin ;
Jiang, Hao ;
Wu, Dapeng .
IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (12) :11678-11692
[9]   Self-Adaptive Gradient Quantization for Geo-Distributed Machine Learning Over Heterogeneous and Dynamic Networks [J].
Fan, Chenyu ;
Zhang, Xiaoning ;
Zhao, Yangming ;
Liu, Yutao ;
Yu, Shui .
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (04) :3483-3496
[10]   Online Training Flow Scheduling for Geo-Distributed Machine Learning Jobs Over Heterogeneous and Dynamic Networks [J].
Fan, Lang ;
Zhang, Xiaoning ;
Zhao, Yangming ;
Sood, Keshav ;
Yu, Shui .
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2024, 10 (01) :277-291