Serverless Federated AUPRC Optimization for Multi-Party Collaborative Imbalanced Data Mining

被引:2
|
作者
Wu, Xidong [1 ]
Hu, Zhengmian [1 ]
Pei, Jian [2 ]
Huang, Heng [3 ]
机构
[1] Univ Pittsburgh, Dept Elect & Comp Engn, Pittsburgh, PA 15260 USA
[2] Duke Univ, Dept Comp Sci, Durham, NC 27706 USA
[3] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
来源
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年
关键词
AUPRC; federated learning; imbalanced data; stochastic optimization; serverless federated learning;
D O I
10.1145/3580305.3599499
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To address the big data challenges, serverless multi-party collaborative training has recently attracted attention in the data mining community, since they can cut down the communications cost by avoiding the server node bottleneck. However, traditional serverless multi-party collaborative training algorithms were mainly designed for balanced data mining tasks and are intended to optimize accuracy (e.g., cross-entropy). The data distribution in many real-world applications is skewed and classifiers, which are trained to improve accuracy, perform poorly when applied to imbalanced data tasks since models could be significantly biased toward the primary class. Therefore, the Area Under Precision-Recall Curve (AUPRC) was introduced as an effective metric. Although multiple single-machine methods have been designed to train models for AUPRC maximization, the algorithm for multi-party collaborative training has never been studied. The change from the single-machine to the multi-party setting poses critical challenges. For example, existing single-machine-based AUPRC maximization algorithms maintain an inner state for local each data point, thus these methods are not applicable to large-scale multi-party collaborative training due to the dependence on each local data point. To address the above challenge, in this paper, we reformulate the serverless multi-party collaborative AUPRC maximization problem as a conditional stochastic optimization problem in a serverless multi-party collaborative learning setting and propose a new ServerLess biAsed sTochastic gradiEnt (SLATE) algorithm to directly optimize the AUPRC. After that, we use the variance reduction technique and propose ServerLess biAsed sTochastic gradiEnt with Momentum-based variance reduction (SLATE-M) algorithm to improve the convergence rate, which matches the best theoretical convergence result reached by the single-machine online method. To the best of our knowledge, this is the first work to solve the multi-party collaborative AUPRC maximization problem. Finally, extensive experiments show the advantages of directly optimizing the AUPRC with distributed learning methods and also verify the efficiency of our new algorithms (i.e., SLATE and SLATE-M).
引用
收藏
页码:2648 / 2659
页数:12
相关论文
共 41 条
  • [31] Collaborative Colorectal Cancer Classification on Highly Class Imbalanced Data Setting via Federated Neural Style Transfer Based Data Augmentation
    Nergiz, Mehmet
    TRAITEMENT DU SIGNAL, 2022, 39 (06) : 2077 - 2086
  • [32] SVM ensemble training for imbalanced data classification using multi-objective optimization techniques
    Joanna Grzyb
    Michał Woźniak
    Applied Intelligence, 2023, 53 : 15424 - 15441
  • [33] SVM ensemble training for imbalanced data classification using multi-objective optimization techniques
    Grzyb, Joanna
    Wozniak, Michal
    APPLIED INTELLIGENCE, 2023, 53 (12) : 15424 - 15441
  • [34] Enhancing Blockchain Security Against Data Tampering: Leveraging Hybrid Model in Multimedia Forensics and Multi-Party Computation for Supply Chain Data Protection
    Islam, Umar
    Alshammari, Abdullah
    Alzaid, Zaid
    Ahmed, Adeel
    Abdullah, Saima
    Iftikhar, Saman
    Bawazeer, Shaikhan
    Izhar, Muhammad
    IEEE ACCESS, 2024, 12 : 111007 - 111020
  • [35] A federated data-driven evolutionary algorithm for expensive multi-/many-objective optimization
    Xu, Jinjin
    Jin, Yaochu
    Du, Wenli
    COMPLEX & INTELLIGENT SYSTEMS, 2021, 7 (06) : 3093 - 3109
  • [36] A federated data-driven evolutionary algorithm for expensive multi-/many-objective optimization
    Jinjin Xu
    Yaochu Jin
    Wenli Du
    Complex & Intelligent Systems, 2021, 7 : 3093 - 3109
  • [37] Evolutionary optimization of the area under precision-recall curve for classifying imbalanced multi-class data
    Chabbouh, Marwa
    Bechikh, Slim
    Mezura-Montes, Efren
    Ben Said, Lamjed
    JOURNAL OF HEURISTICS, 2025, 31 (01)
  • [38] MOO-Points - Distance-based Method for Multi-objective Optimization in the Imbalanced Data Classification Task
    Borek-Marciniec, Weronika
    Wozniak, Michal
    COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT, CISIM 2024, 2024, 14902 : 319 - 331
  • [39] Conception of a dominance-based multi-objective local search in the context of classification rule mining in large and imbalanced data sets
    Jacques, Julie
    Taillard, Julien
    Delerue, David
    Dhaenens, Clarisse
    Jourdan, Laetitia
    APPLIED SOFT COMPUTING, 2015, 34 : 705 - 720
  • [40] DRL-Enabled Hierarchical Federated Learning Optimization for Data Heterogeneity Management in Multi-Access Edge Computing
    Cho, Suhyun
    Lim, Sunhwan
    Lee, Joohyung
    IEEE ACCESS, 2024, 12 : 147209 - 147219