Efficient analysis of COVID-19 clinical data using machine learning models

被引:19
作者
Ali, Sarwan [1 ]
Zhou, Yijing [1 ]
Patterson, Murray [1 ]
机构
[1] Georgia State Univ, Atlanta, GA 30303 USA
关键词
COVID-19; Coronavirus; Clinical data; Classification; Feature selection;
D O I
10.1007/s11517-022-02570-8
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Because of the rapid spread of COVID-19 to almost every part of the globe, huge volumes of data and case studies have been made available, providing researchers with a unique opportunity to find trends and make discoveries like never before by leveraging such big data. This data is of many different varieties and can be of different levels of veracity, e.g., precise, imprecise, uncertain, and missing, making it challenging to extract meaningful information from such data. Yet, efficient analyses of this continuously growing and evolving COVID-19 data is crucial to inform - often in real-time - the relevant measures needed for controlling, mitigating, and ultimately avoiding viral spread. Applying machine learning-based algorithms to this big data is a natural approach to take to this aim since they can quickly scale to such data and extract the relevant information in the presence of variety and different levels of veracity. This is important for COVID-19 and potential future pandemics in general. In this paper, we design a straightforward encoding of clinical data (on categorical attributes) into a fixed-length feature vector representation and then propose a model that first performs efficient feature selection from such representation. We apply this approach to two clinical datasets of the COVID-19 patients and then apply different machine learning algorithms downstream for classification purposes. We show that with the efficient feature selection algorithm, we can achieve a prediction accuracy of more than 90% in most cases. We also computed the importance of different attributes in the dataset using information gain. This can help the policymakers focus on only certain attributes to study this disease rather than focusing on multiple random factors that may not be very informative to patient outcomes.
引用
收藏
页码:1881 / 1896
页数:16
相关论文
共 40 条
[1]  
Abdulkareem KH, 2021, IEEE INTERNET THINGS
[2]   Combinatorial trace method for network immunization [J].
Ahmad, Muhammad ;
Ali, Sarwan ;
Tariq, Juvaria ;
Khan, Imdadullah ;
Shabbir, Mudassir ;
Zaman, Arif .
INFORMATION SCIENCES, 2020, 519 :215-228
[3]   Comparison of deep learning approaches to predict COVID-19 infection [J].
Alakus, Talha Burak ;
Turkoglu, Ibrahim .
CHAOS SOLITONS & FRACTALS, 2020, 140
[4]   Role of biological Data Mining and Machine Learning Techniques in Detecting and Diagnosing the Novel Coronavirus (COVID-19): A Systematic Review [J].
Albahri, A. S. ;
Hamid, Rula A. ;
Alwan, Jwan K. ;
Al-qays, Z. T. ;
Zaidan, A. A. ;
Zaidan, B. B. ;
Albahri, A. O. S. ;
AlAmoodi, A. H. ;
Khlaf, Jamal Mawlood ;
Almahdi, E. M. ;
Thabet, Eman ;
Hadi, Suha M. ;
Mohammed, K., I ;
Alsalem, M. A. ;
Al-Obaidi, Jameel R. ;
Madhloom, H. T. .
JOURNAL OF MEDICAL SYSTEMS, 2020, 44 (07)
[5]  
Ali S, 2019, INT C FUT EN SYST
[6]  
Ali S, ARXIV PREPRINT ARXIV
[7]  
Ali S, 2021 IEEE INT C BIG
[8]  
Ali S, 2021 5 INT C BIG DAT, P4249
[9]  
Ali S, INT S BIOINF RES APP
[10]  
Ali S, 2019, Endoscopy artifact detection (EAD 2019) challenge dataset