FedArtML: A Tool to Facilitate the Generation of Non-IID Datasets in a Controlled Way to Support Federated Learning Research

被引:1
|
作者
Gutierrez, Daniel Mauricio Jimenez [1 ]
Anagnostopoulos, Aris [1 ]
Chatzigiannakis, Ioannis [1 ]
Vitaletti, Andrea [1 ]
机构
[1] Sapienza Univ Rome, Dept Comp Control & Management Engn, I-00185 Rome, Italy
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Data models; Measurement; Training; Data privacy; Systematics; Federated learning; Distributed databases; Machine learning; Centralized datasets; client's heterogeneity; federated datasets; federated learning; heterogeneity metrics; machine learning; non-IID-ness;
D O I
10.1109/ACCESS.2024.3410026
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Federated Learning (FL) enables collaborative training of Machine Learning (ML) models across decentralized clients while preserving data privacy. One of the challenges that FL faces is when the clients' data is not independent and identically distributed (non-IID). It is, therefore, crucial to quantify how non-IID data impacts performance. However, due to the limited number of federated data available, it is not easy to carry out real-world simulations. In this work, we propose for the first time 1) the Hist-Dirichlet-based and Min-Size-Dirichlet methods for partitioning data into multiple nodes using the features and quantity distribution and the Dirichlet distribution. We use the 2) Jensen-Shannon and Hellinger distances for quantifying the degree of IID data. Moreover, we implemented 3) state-of-the-art partitioning methods based on the labels' distribution across clients. All our proposals are open-source in a library called FedArtML, publicly available on PyPI. It facilitates research on cross-silo and cross-device FL, allowing a systematic and controlled partition of centralized datasets using the label, features, and quantity skewness. To demonstrate the value of our proposed methods and the robustness of FedArtML, we experimented in the ECG arrhythmia detection field with Physionet 2020 data. Our results demonstrate that our tool generates federated datasets for multi-client model training and accurately measures client distribution heterogeneity. Our approach achieves 48% higher non-IID-ness than existing feature skew methods, providing more granularity. Furthermore, we validate our simulated federated datasets against real-world data, revealing only a 2% F1-Score difference, affirming the method's real-life applicability.
引用
收藏
页码:81004 / 81016
页数:13
相关论文
共 50 条
  • [31] A state-of-the-art survey on solving non-IID data in Federated Learning
    Ma, Xiaodong
    Zhu, Jia
    Lin, Zhihao
    Chen, Shanxuan
    Qin, Yangjie
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 135 : 244 - 258
  • [32] Federated Analytics Informed Distributed Industrial IoT Learning With Non-IID Data
    Wang, Zibo
    Zhu, Yifei
    Wang, Dan
    Han, Zhu
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2023, 10 (05): : 2924 - 2939
  • [33] CCSF: Clustered Client Selection Framework for Federated Learning in non-IID Data
    Mohamed, Aissa H.
    de Souza, Allan M.
    da Costa, Joahannes B. D.
    Villas, Leandro A.
    Dos Reis, Julio C.
    16TH IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, UCC 2023, 2023,
  • [34] Spread plus : Scalable Model Aggregation in Federated Learning With Non-IID Data
    Liang, Huanghuang
    Yang, Xin
    Han, Xiaoming
    Liu, Boan
    Hu, Chuang
    Wang, Dan
    Zhou, Xiaobo
    Cheng, Dazhao
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2025, 36 (04) : 701 - 716
  • [35] Non-IID quantum federated learning with one-shot communication complexity
    Zhao, Haimeng
    QUANTUM MACHINE INTELLIGENCE, 2023, 5 (01)
  • [36] IOFL: Intelligent-Optimization-Based Federated Learning for Non-IID Data
    Li, Xinyan
    Zhao, Huimin
    Deng, Wu
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (09): : 16693 - 16699
  • [37] FedRDS: Federated Learning on Non-IID Data via Regularization and Data Sharing
    Lv, Yankai
    Ding, Haiyan
    Wu, Hao
    Zhao, Yiji
    Zhang, Lei
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [38] Node Selection Toward Faster Convergence for Federated Learning on Non-IID Data
    Wu, Hongda
    Wang, Ping
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (05): : 3099 - 3111
  • [39] Dual Adversarial Federated Learning on Non-IID Data
    Zhang, Tao
    Yang, Shaojing
    Song, Anxiao
    Li, Guangxia
    Dong, Xuewen
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 233 - 246
  • [40] Fast converging Federated Learning with Non-IID Data
    Naas, Si -Ahmed
    Sigg, Stephan
    2023 IEEE 97TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-SPRING, 2023,