Adapter-Based Selective Knowledge Distillation for Federated Multi-Domain Meeting Summarization

被引：0

作者：

Feng, Xiachong ^{[1
]}

Feng, Xiaocheng ^{[2
,3
]}

Du, Xiyuan ^{[3
]}

Kan, Min-Yen ^{[4
]}

Qin, Bing ^{[2
,3
]}

机构：

[1] Univ Hong Kong, Hong Kong, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518000, Peoples R China

[3] Harbin Inst Technol, Harbin 150001, Peoples R China

[4] Natl Univ Singapore, Sch Comp, Singapore 117417, Singapore

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

基金：

国家重点研发计划;

关键词：

Adaptation models; Servers; Federated learning; Data models; Task analysis; Training; Optimization; Meeting summarization; federated learning; knowledge distillation; parameter-efficient fine-tuning;

D O I：

10.1109/TASLP.2024.3414313

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Meeting summarization has emerged as a promising technique for providing users with condensed summaries. However, existing work has focused on training models on centralized data, neglecting real-world scenarios where meeting data are infeasible to collect centrally, due to their sensitive nature. This gap motivates us to explore federated learning for meeting summarization. Two critical challenges impede progress. First, state-of-the-art summarizers are based on parameter-heavy pre-trained models. Exchanging such a model's parameters across clients imposes large bandwidth costs. Second, as real-world meeting data belong to various domains and are distributed across clients, they are instances of non-identically and independently distributed (non-IID). IID assumptions do not hold, which changes which forms of learning algorithms best apply. To address this, we propose Adapter-based Federated Selective Knowledge Distillation (AdaFedSelecKD) for training performant client models. Specifically, we develop an adapter-based summarization model where two adapters cooperatively facilitate learning using fewer parameters to reduce communication costs. Then, we devise a selective knowledge distillation strategy, assisting clients in robustly handling domain-focused modelling on their own data, while leveraging global parameters based on non-IID data. Extensive experiments on the QMSum benchmark demonstrate AdaFedSelecKD can achieve comparable performance with powerful centralized training methods, and shows its generalizability and robustness.

引用

页码：3694 / 3708

页数：15

共 52 条

[1]

Ba JimmyLei., 2016, CORR, DOI DOI 10.48550/ARXIV.1607.06450

[2]

Banerjee S., 2015, Proceedings of the 15th ACM Symposium on Document Engineering (DocEng' 2015), P51

[3]

Beltagy I, 2020, Arxiv, DOI [arXiv:2004.05150, 10.48550/arXiv.2004.05150]

[4]

Brown TB, 2020, ADV NEUR IN, V33

[5]

Carletta J, 2005, LECT NOTES COMPUT SC, V3869, P28

[6]

Chen CC, 2023, Arxiv, DOI arXiv:2307.08925

[7]

Dai DM, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P7085

[8] The Algorithmic Foundations of Differential Privacy [J].

Dwork, Cynthia ;

Roth, Aaron .

FOUNDATIONS AND TRENDS IN THEORETICAL COMPUTER SCIENCE, 2013, 9 (3-4) :211-406

[9]

Feng X., 2020, P INT JOINT C ART IN, P3808

[10]

Feng XC, 2022, PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, P5453

← 1 2 3 4 5 6 →