Distributionally Robust Federated Learning for Network Traffic Classification With Noisy Labels

被引:1
作者
Shi, Siping [1 ]
Guo, Yingya [2 ]
Wang, Dan [1 ]
Zhu, Yifei [3 ]
Han, Zhu [4 ,5 ]
机构
[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[2] Fuzhou Univ, Coll Comp & Data Sci, Fuzhou 350108, Fujian, Peoples R China
[3] Univ Michigan Shanghai Jiao Tong Univ Joint Inst, Cooperat Medianet Innovat Ctr CM, Shanghai 200240, Peoples R China
[4] Univ Houston, Dept Elect & Comp Engn, Houston, TX 77004 USA
[5] Kyung Hee Univ, Dept Comp Sci & Engn, Seoul 446701, South Korea
关键词
Noise measurement; Telecommunication traffic; Training; Mobile handsets; Data models; Servers; Uncertainty; Network traffic classification; federated learning; distributionally robust optimization; OPTIMIZATION;
D O I
10.1109/TMC.2023.3319657
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Network traffic classifiers of mobile devices are widely learned with federated learning(FL) for privacy preservation. Noisy labels commonly occur in each device and deteriorate the accuracy of the learned network traffic classifier. Existing noise elimination approaches attempt to solve this by detecting and removing noisy labeled data before training. However, they may lead to poor performance of the learned classifier, as the remaining traffic data in each device is few after noise removal. Motivated by the observation that the data feature of the noisy labeled traffic data is clean and the underlying true distribution of the noisy labeled data is statistically close to the clean traffic data, we propose to utilize the noisy labeled data by normalizing it to be close to the clean traffic data distribution. Specifically, we first formulate a distributionally robust federated network traffic classifier learning problem (DR-NTC) to jointly take the normalized traffic data and clean data into training. Then we specify the normalization function under Wasserstein distance to transform the noisy labeled traffic data into a certified robust region around the clean data distribution, and we reformulate the DR-NTC problem into an equivalent DR-NTC-W problem. Finally, we design a robust federated network traffic classifier learning algorithm, RFNTC, to solve the DR-NTC-W problem. Theoretical analysis shows the robustness guarantee of RFNTC. We evaluate the algorithm by training classifiers on a real-world dataset. Our experimental results show that RFNTC significantly improves the accuracy of the learned classifier by up to 1.05 times.
引用
收藏
页码:6212 / 6226
页数:15
相关论文
共 44 条
[1]   Security for 5G and Beyond [J].
Ahmad, Ijaz ;
Shahabuddin, Shahriar ;
Kumar, Tanesh ;
Okwuibe, Jude ;
Gurtov, Andrei ;
Ylianttila, Mika .
IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2019, 21 (04) :3682-3722
[2]   BAKRY-EMERY CURVATURE-DIMENSION CONDITION AND RIEMANNIAN RICCI CURVATURE BOUNDS [J].
Ambrsio, Luigi ;
Gigli, Nicola ;
Savare, Giuseppe .
ANNALS OF PROBABILITY, 2015, 43 (01) :339-404
[3]   Machine Learning for Encrypted Malware Traffic Classification: Accounting for Noisy Labels and Non-Stationarity [J].
Anderson, Blake ;
McGrew, David .
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, :1723-1732
[4]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[5]   FedPacket: A Federated Learning Approach to Mobile Packet Classification [J].
Bakopoulou, Evita ;
Tillman, Balint ;
Markopoulou, Athina .
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2022, 21 (10) :3609-3628
[6]   Robust Solutions of Optimization Problems Affected by Uncertain Probabilities [J].
Ben-Tal, Aharon ;
den Hertog, Dick ;
De Waegenaere, Anja ;
Melenberg, Bertrand ;
Rennen, Gijs .
MANAGEMENT SCIENCE, 2013, 59 (02) :341-357
[7]   TEAVAR: Striking the Right Utilization-Availability Balance in WAN Traffic Engineering [J].
Bogle, Jeremy ;
Bhatia, Nikhil ;
Ghobadi, Manya ;
Menache, Ishai ;
Bjorner, Nikolaj ;
Valadarsky, Asaf ;
Schapira, Michael .
SIGCOMM '19 - PROCEEDINGS OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2019, :29-43
[8]   Constructing Pathway-Based Priors within a Gaussian Mixture Model for Bayesian Regression and Classification [J].
Boluki, Shahin ;
Esfahani, Mohammad Shahrokh ;
Qian, Xiaoning ;
Dougherty, Edward R. .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (02) :524-537
[9]   Digital Twin for Federated Analytics Using a Bayesian Approach [J].
Chen, Dawei ;
Wang, Dan ;
Zhu, Yifei ;
Han, Zhu .
IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (22) :16301-16312
[10]   Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems [J].
Delage, Erick ;
Ye, Yinyu .
OPERATIONS RESEARCH, 2010, 58 (03) :595-612