FedArtML: A Tool to Facilitate the Generation of Non-IID Datasets in a Controlled Way to Support Federated Learning Research

被引:1
|
作者
Gutierrez, Daniel Mauricio Jimenez [1 ]
Anagnostopoulos, Aris [1 ]
Chatzigiannakis, Ioannis [1 ]
Vitaletti, Andrea [1 ]
机构
[1] Sapienza Univ Rome, Dept Comp Control & Management Engn, I-00185 Rome, Italy
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Data models; Measurement; Training; Data privacy; Systematics; Federated learning; Distributed databases; Machine learning; Centralized datasets; client's heterogeneity; federated datasets; federated learning; heterogeneity metrics; machine learning; non-IID-ness;
D O I
10.1109/ACCESS.2024.3410026
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Federated Learning (FL) enables collaborative training of Machine Learning (ML) models across decentralized clients while preserving data privacy. One of the challenges that FL faces is when the clients' data is not independent and identically distributed (non-IID). It is, therefore, crucial to quantify how non-IID data impacts performance. However, due to the limited number of federated data available, it is not easy to carry out real-world simulations. In this work, we propose for the first time 1) the Hist-Dirichlet-based and Min-Size-Dirichlet methods for partitioning data into multiple nodes using the features and quantity distribution and the Dirichlet distribution. We use the 2) Jensen-Shannon and Hellinger distances for quantifying the degree of IID data. Moreover, we implemented 3) state-of-the-art partitioning methods based on the labels' distribution across clients. All our proposals are open-source in a library called FedArtML, publicly available on PyPI. It facilitates research on cross-silo and cross-device FL, allowing a systematic and controlled partition of centralized datasets using the label, features, and quantity skewness. To demonstrate the value of our proposed methods and the robustness of FedArtML, we experimented in the ECG arrhythmia detection field with Physionet 2020 data. Our results demonstrate that our tool generates federated datasets for multi-client model training and accurately measures client distribution heterogeneity. Our approach achieves 48% higher non-IID-ness than existing feature skew methods, providing more granularity. Furthermore, we validate our simulated federated datasets against real-world data, revealing only a 2% F1-Score difference, affirming the method's real-life applicability.
引用
收藏
页码:81004 / 81016
页数:13
相关论文
共 50 条
  • [21] Gradient Calibration for Non-IID Federated Learning
    Li, Jiachen
    Zhang, Yuchao
    Li, Yiping
    Gong, Xiangyang
    Wang, Wendong
    PROCEEDINGS OF THE 2023 THE 2ND ACM WORKSHOP ON DATA PRIVACY AND FEDERATED LEARNING TECHNOLOGIES FOR MOBILE EDGE NETWORK, FEDEDGE 2023, 2023, : 119 - 124
  • [22] A federated learning algorithm using parallel-ensemble method on non-IID datasets
    Haoran Yu
    Chang Wu
    Haixin Yu
    Xuelin Wei
    Siyan Liu
    Ying Zhang
    Complex & Intelligent Systems, 2023, 9 : 6891 - 6903
  • [23] Mitigating Update Conflict in Non-IID Federated Learning via Orthogonal Class Gradients
    Guo, Siyang
    Guo, Yaming
    Zhang, Hui
    Wang, Junbo
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (04) : 2967 - 2978
  • [24] A federated learning algorithm using parallel-ensemble method on non-IID datasets
    Yu, Haoran
    Wu, Chang
    Yu, Haixin
    Wei, Xuelin
    Liu, Siyan
    Zhang, Ying
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (06) : 6891 - 6903
  • [25] A General Federated Learning Scheme with Blockchain on Non-IID Data
    Wu, Hao
    Zhao, Shengnan
    Zhao, Chuan
    Jing, Shan
    INFORMATION SECURITY AND CRYPTOLOGY, INSCRYPT 2023, PT I, 2024, 14526 : 126 - 140
  • [26] GANFed: GAN-based Federated Learning with Non-IID Datasets in Edge IoTs
    Fan, Xin
    Wang, Yue
    Zhang, Weishan
    Li, Yingshu
    Cai, Zhipeng
    Tian, Zhi
    ICC 2024 - IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2024, : 5443 - 5448
  • [27] Ensemble Federated Learning With Non-IID Data in Wireless Networks
    Zhao, Zhongyuan
    Wang, Jingyi
    Hong, Wei
    Quek, Tony Q. S.
    Ding, Zhiguo
    Peng, Mugen
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2024, 23 (04) : 3557 - 3571
  • [28] Feature Matching Data Synthesis for Non-IID Federated Learning
    Li, Zijian
    Sun, Yuchang
    Shao, Jiawei
    Mao, Yuyi
    Wang, Jessie Hui
    Zhang, Jun
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (10) : 9352 - 9367
  • [29] Non-IID quantum federated learning with one-shot communication complexity
    Haimeng Zhao
    Quantum Machine Intelligence, 2023, 5
  • [30] Learning Critically: Selective Self-Distillation in Federated Learning on Non-IID Data
    He, Yuting
    Chen, Yiqiang
    Yang, XiaoDong
    Yu, Hanchao
    Huang, Yi-Hua
    Gu, Yang
    IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (06) : 789 - 800