Enhancing Federated Learning Convergence With Dynamic Data Queue and Data-Entropy-Driven Participant Selection

被引:0
作者
Herath, Charuka [1 ]
Liu, Xiaolan [1 ]
Lambotharan, Sangarapillai [1 ]
Rahulamathavan, Yogachandran [1 ]
机构
[1] Loughborough Univ London, Inst Digital Technol, London E20 3BS, England
来源
IEEE INTERNET OF THINGS JOURNAL | 2025年 / 12卷 / 06期
基金
英国工程与自然科学研究理事会;
关键词
Data models; Convergence; Internet of Things; Distributed databases; Accuracy; Training; Mathematical models; Servers; Adaptation models; Data entropy; fairness FL; federated learning (FL); not identically and independently distributed (non-IID);
D O I
10.1109/JIOT.2024.3491034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Federated learning (FL) is a decentralized approach for collaborative model training on edge devices. This distributed method of model training offers advantages in privacy, security, regulatory compliance, and cost efficiency. Our emphasis in this research lies in addressing statistical complexity in FL, especially when the data stored locally across devices is not identically and independently distributed (non-IID). We have observed an accuracy reduction of up to approximately 10%-30%, particularly in skewed scenarios where each edge device trains with only 1 class of data. This reduction is attributed to weight divergence, quantified using the Euclidean distance between device-level class distributions and the population distribution, resulting in a bias term (delta(k)) . As a solution, we present a method to improve convergence in FL by creating a global subset of data on the server and dynamically distributing it across devices using a dynamic data queue-driven FL (DDFL). Next, we leverage Data Entropy metrics to observe the process during each training round and enable reasonable device selection for aggregation. Furthermore, we provide a convergence analysis of our proposed DDFL to justify their viability in practical FL scenarios, aiming for better device selection, a non-suboptimal global model, and faster convergence. We observe that our approach results in a substantial accuracy boost of approximately 5% for the MNIST dataset, around 18% for CIFAR-10, and 20% for CIFAR-100 with a 10% global subset of data, outperforming the state-of-the-art (SOTA) aggregation algorithms.
引用
收藏
页码:6646 / 6658
页数:13
相关论文
共 41 条
  • [41] Data driven feature selection and machine learning to detect misplaced V1 and V2 chest electrodes when recording the 12-lead electrocardiogram
    Rjoob, Khaled
    Bond, Raymond
    Finlay, Dewar
    McGilligan, Victoria
    Leslie, Stephen J.
    Iftikhar, Aleeha
    Guldenring, Daniel
    Rababah, Ali
    Knoery, Charles
    McShane, Anne
    Peace, Aaron
    JOURNAL OF ELECTROCARDIOLOGY, 2019, 57 : 39 - 43