ModalityMirror: Enhancing Audio Classification in Modality Heterogeneity Federated Learning via Multimodal Distillation

被引:0
作者
Feng, Tiantian [1 ]
Zhang, Tuo [1 ]
Avestimehr, Salman [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90007 USA
来源
PROCEEDINGS OF THE 2025 THE 35TH EDITION OF THE WORKSHOP ON NETWORK AND OPERATING SYSTEM SUPPORT FOR DIGITAL AUDIO AND VIDEO, NOSSDAV 2025 | 2025年
关键词
Speech recognition; Federated learning; Multimodal learning; Model distillation;
D O I
10.1145/3712678.3721885
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal Federated Learning frequently encounters challenges of client modality heterogeneity, leading to undesired performances for secondary modality in multimodal learning. It is particularly prevalent in audiovisual learning, with audio is often assumed to be the weaker modality in recognition tasks. To address this challenge, we introduce ModalityMirror to improve audio model performance by leveraging knowledge distillation from an audiovisual federated learning model. ModalityMirror involves two phases: a modality-wise FL stage to aggregate unimodal encoders; and a federated knowledge distillation stage on multimodality clients to train a unimodal student model. Our results demonstrate that ModalityMirror significantly improves the audio classification compared to the state-of-the-art FL methods such as Harmony, particularly in audiovisual FL facing video missing. Our approach unlocks the potential for exploiting the diverse modality spectrum inherent in multimodal FL.
引用
收藏
页码:78 / 83
页数:6
相关论文
共 24 条
[1]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[2]   Leveraging Foundation Models for Multi-modal Federated Learning with Incomplete Modality [J].
Che, Liwei ;
Wang, Jiaqi ;
Liu, Xinyue ;
Ma, Fenglong .
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-APPLIED DATA SCIENCE TRACK, PT IX, ECML PKDD 2024, 2024, 14949 :401-417
[3]   FedMSplit: Correlation-Adaptive Federated Multi-Task Learning across Multimodal Split Networks [J].
Chen, Jiayi ;
Zhang, Aidong .
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, :87-96
[4]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[5]   FedMultimodal: A Benchmark For Multimodal Federated Learning [J].
Feng, Tiantian ;
Bose, Digbalay ;
Zhang, Tuo ;
Hebbar, Rajat ;
Ramakrishna, Anil ;
Gupta, Rahul ;
Zhang, Mi ;
Avestimehr, Salman ;
Narayanan, Shrikanth .
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, :4035-4045
[6]  
Geiping J, 2020, 34 C NEURAL INFORM P, V33, DOI DOI 10.1561/0400000042
[7]  
Gong Y, 2023, Arxiv, DOI arXiv:2210.07839
[8]  
Gong Yuan, 2021, arXiv
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]   Multimodal machine learning in precision health: A scoping review [J].
Kline, Adrienne ;
Wang, Hanyin ;
Li, Yikuan ;
Dennis, Saya ;
Hutch, Meghan ;
Xu, Zhenxing ;
Wang, Fei ;
Cheng, Feixiong ;
Luo, Yuan .
NPJ DIGITAL MEDICINE, 2022, 5 (01)