FedArtML: A Tool to Facilitate the Generation of Non-IID Datasets in a Controlled Way to Support Federated Learning Research

被引:1
|
作者
Gutierrez, Daniel Mauricio Jimenez [1 ]
Anagnostopoulos, Aris [1 ]
Chatzigiannakis, Ioannis [1 ]
Vitaletti, Andrea [1 ]
机构
[1] Sapienza Univ Rome, Dept Comp Control & Management Engn, I-00185 Rome, Italy
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Data models; Measurement; Training; Data privacy; Systematics; Federated learning; Distributed databases; Machine learning; Centralized datasets; client's heterogeneity; federated datasets; federated learning; heterogeneity metrics; machine learning; non-IID-ness;
D O I
10.1109/ACCESS.2024.3410026
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Federated Learning (FL) enables collaborative training of Machine Learning (ML) models across decentralized clients while preserving data privacy. One of the challenges that FL faces is when the clients' data is not independent and identically distributed (non-IID). It is, therefore, crucial to quantify how non-IID data impacts performance. However, due to the limited number of federated data available, it is not easy to carry out real-world simulations. In this work, we propose for the first time 1) the Hist-Dirichlet-based and Min-Size-Dirichlet methods for partitioning data into multiple nodes using the features and quantity distribution and the Dirichlet distribution. We use the 2) Jensen-Shannon and Hellinger distances for quantifying the degree of IID data. Moreover, we implemented 3) state-of-the-art partitioning methods based on the labels' distribution across clients. All our proposals are open-source in a library called FedArtML, publicly available on PyPI. It facilitates research on cross-silo and cross-device FL, allowing a systematic and controlled partition of centralized datasets using the label, features, and quantity skewness. To demonstrate the value of our proposed methods and the robustness of FedArtML, we experimented in the ECG arrhythmia detection field with Physionet 2020 data. Our results demonstrate that our tool generates federated datasets for multi-client model training and accurately measures client distribution heterogeneity. Our approach achieves 48% higher non-IID-ness than existing feature skew methods, providing more granularity. Furthermore, we validate our simulated federated datasets against real-world data, revealing only a 2% F1-Score difference, affirming the method's real-life applicability.
引用
收藏
页码:81004 / 81016
页数:13
相关论文
共 50 条
  • [1] Federated Learning Framework for IID and Non-IID datasets of Medical Images
    Srinivasasn, Kavitha
    Prasanna, Sainath
    Midha, Rohit
    Mohan, Shraddhaa
    EMITTER-INTERNATIONAL JOURNAL OF ENGINEERING TECHNOLOGY, 2023, 11 (01) : 1 - 20
  • [2] Multi-Modal Federated Learning for Cancer Staging Over Non-IID Datasets With Unbalanced Modalities
    Borazjani, Kasra
    Khosravan, Naji
    Ying, Leslie
    Hosseinalipour, Seyyedali
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (01) : 556 - 573
  • [3] Adaptive Federated Learning on Non-IID Data With Resource Constraint
    Zhang, Jie
    Guo, Song
    Qu, Zhihao
    Zeng, Deze
    Zhan, Yufeng
    Liu, Qifeng
    Akerkar, Rajendra
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (07) : 1655 - 1667
  • [4] Federated learning on non-IID data: A survey
    Zhu, Hangyu
    Xu, Jinjin
    Liu, Shiqing
    Jin, Yaochu
    NEUROCOMPUTING, 2021, 465 : 371 - 390
  • [5] Federated Learning With Taskonomy for Non-IID Data
    Jamali-Rad, Hadi
    Abdizadeh, Mohammad
    Singh, Anuj
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) : 8719 - 8730
  • [6] Reschedule Gradients: Temporal Non-IID Resilient Federated Learning
    You, Xianyao
    Liu, Ximeng
    Jiang, Nan
    Cai, Jianping
    Ying, Zuobin
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (01) : 747 - 762
  • [7] A Novel Approach for Federated Learning with Non-IID Data
    Nguyen, Hiep
    Warrier, Harikrishna
    Gupta, Yogesh
    2022 9TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE, ISCMI, 2022, : 62 - 67
  • [8] Federated Learning With Non-IID Data in Wireless Networks
    Zhao, Zhongyuan
    Feng, Chenyuan
    Hong, Wei
    Jiang, Jiamo
    Jia, Chao
    Quek, Tony Q. S.
    Peng, Mugen
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2022, 21 (03) : 1927 - 1942
  • [9] Federated Learning With Non-IID Data: A Survey
    Lu, Zili
    Pan, Heng
    Dai, Yueyue
    Si, Xueming
    Zhang, Yan
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (11): : 19188 - 19209
  • [10] Data independent warmup scheme for non-IID federated learning
    Arafeh, Mohamad
    Ould-Slimane, Hakima
    Otrok, Hadi
    Mourad, Azzam
    Talhi, Chamseddine
    Damiani, Ernesto
    INFORMATION SCIENCES, 2023, 623 : 342 - 360