FedVQA: Personalized Federated Visual Question Answering over Heterogeneous Scenes

被引:2
作者
Lao, Mingrui [1 ]
Pu, Nan [2 ]
Zhong, Zhun [3 ]
Sebe, Nicu [2 ]
Lew, Michael S. [1 ]
机构
[1] Leiden Univ, Leiden, Netherlands
[2] Univ Trento, Trento, Italy
[3] Univ Nottingham, Nottingham, England
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
关键词
Personalized Federated Learning; Visual Question Answering; Knowledge Preserving; Pairwise Preference;
D O I
10.1145/3581783.3611958
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new setting for visual question answering (VQA) called personalized federated VQA (FedVQA) that addresses the growing need for decentralization and data privacy protection. FedVQA is both practical and challenging, requiring clients to learn well-personalized models on scene-specific datasets with severe feature/label distribution skews. These models then collaborate to optimize a generic global model on a central server, which is desired to generalize well on both seen and unseen scenes without sharing raw data with the server and other clients. The primary challenge of FedVQA is that, client models tend to forget the global knowledge initialized from central server during the personalized training, which impairs their personalized capacity due to the potential overfitting issue on local data. This further leads to divergence issues when aggregating distinct personalized knowledge at the central server, resulting in an inferior generalization ability on unseen scenes. To address the challenge, we propose a novel federated pairwise preference preserving (FedP3) framework to improve personalized learning via preserving generic knowledge under FedVQA constraints. Specifically, we first design a differentiable pairwise preference (DPP) to improve knowledge preserving by formulating a flexible yet effective global knowledge. Then, we introduce a forgotten-knowledge filter (FKF) to encourage the client models to selectively consolidate easily-forgotten knowledge. By aggregating the DPP and the FKF, FedP3 coordinates the generic and the personalized knowledge to enhance the personalized ability of clients and generalizability of the server. Extensive experiments show that FedP3 consistently surpasses the state-of-the-art in FedVQA task.
引用
收藏
页码:7796 / 7807
页数:12
相关论文
共 61 条
[1]  
Acar Durmus Alp Emre, 2021, arXiv preprint arXiv:2111.04263
[2]   Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering [J].
Agrawal, Aishwarya ;
Batra, Dhruv ;
Parikh, Devi ;
Kembhavi, Aniruddha .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4971-4980
[3]   FedCD: Personalized Federated Learning via Collaborative Distillation [J].
Ahmad, Sabtain ;
Aral, Atakan .
2022 IEEE/ACM 15TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, UCC, 2022, :189-194
[4]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[5]   VQA: Visual Question Answering [J].
Antol, Stanislaw ;
Agrawal, Aishwarya ;
Lu, Jiasen ;
Mitchell, Margaret ;
Batra, Dhruv ;
Zitnick, C. Lawrence ;
Parikh, Devi .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433
[6]   RMLVQA: A Margin Loss Approach For Visual Question Answering with Language Biases [J].
Basu, Abhipsa ;
Addepalli, Sravanti ;
Babu, R. Venkatesh .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :11671-11680
[7]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[8]  
Briggs C., 2020, Federated learning with hierarchical clustering of local updates to improve training on non-IID data, DOI DOI 10.1109/IJCNN48605.2020.9207469
[9]   UNITER: UNiversal Image-TExt Representation Learning [J].
Chen, Yen-Chun ;
Li, Linjie ;
Yu, Licheng ;
El Kholy, Ahmed ;
Ahmed, Faisal ;
Gan, Zhe ;
Cheng, Yu ;
Liu, Jingjing .
COMPUTER VISION - ECCV 2020, PT XXX, 2020, 12375 :104-120
[10]  
Fallah Alireza, 2020, Personalized federated learning: A meta-learning approach