Thoughts on Non-IID Data Impact in Healthcare with Federated Learning Medical Blockchain

被引:0
作者
Shae, Zon-Yin [1 ]
Chen, Kun-Yi [1 ]
Chang, Chi-Yu [1 ]
Tsai, Yuan-Yu [2 ]
Chou, Che-Yi [3 ]
Baskett, William I. [4 ]
Shyu, Chi-Ren [4 ]
Tsai, Jeffrey J. P. [5 ]
机构
[1] Asia Univ, Dept Comp Sci & Informat Engn, Taichung, Taiwan
[2] Asia Univ, Dept M Commerce & Multimedia Applicat, Taichung, Taiwan
[3] Asia Univ Hosp, Div Nephrol, Taichung, Taiwan
[4] Univ Missouri, Inst Data Sci & Informat, Columbia, MO USA
[5] Asia Univ, Dept Bioinformat & Med Engn, Taichung, Taiwan
来源
2022 IEEE 4TH INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE, COGMI | 2022年
关键词
Medical blockchain; non-iid; health data privacy; medical data set aggregation; distributed federated medical data lake; AI; federated learning;
D O I
10.1109/CogMI56440.2022.00013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We share the common hypothesis/belief that the more aggregated good quality training data, the better the performance that can be attained by the resulting Artificial Intelligence (AI) model. However, this common belief, in general, is not true in the medical area, since healthcare data sets sourced from different hospitals are often not identically distributed (NonIID). This imposes severe technical challenges for effectively aggregating the individual hospital data sets together. In this vision paper, instead of offering complete solutions, we will discuss some questions and food for thought with the goal of aiding effective data aggregation and improving federated learning (FL) AI model performance: (1) benchmark and measure the Non-IID degree of medical data sets. (2) include the Non-IID degree metrics in the FL data aggregation mechanism. (3) search for the optimal global model creation strategy among a group of many medical data sets. (4) investigate FL performance better than the centralized learning. This paper will discuss these questions by outlining a visionary approach for exploring a medical blockchain FL mechanism to effectively aggregate medical data across multiple healthcare systems to serve large populations with broad demographics.
引用
收藏
页码:20 / 26
页数:7
相关论文
共 33 条
[1]  
[Anonymous], CENTRAL LIMIT THEORE
[2]  
[Anonymous], EICU COLL RES DAT
[3]  
Beth Israel Medical Information Mart for Intensive Care (MIMIC), ABOUT AS
[4]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[5]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[7]   Towards Non-IID image classification: A dataset and baselines [J].
He, Yue ;
Shen, Zheyan ;
Cui, Peng .
PATTERN RECOGNITION, 2021, 110
[8]  
Hsieh K., 2020, PR MACH LEARN RES, P4387
[9]   Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges [J].
Huang, Shigao ;
Yang, Jie ;
Fong, Simon ;
Zhao, Qi .
CANCER LETTERS, 2020, 471 :61-71
[10]  
Konecny J, 2016, NIPS WORKSHOP PRIVAT