Prediction and Feature Importance Analysis for Severity of COVID-19 in South Korea Using Artificial Intelligence: Model Development and Validation

被引:25
作者
Chung, Heewon [1 ]
Ko, Hoon [1 ]
Kang, Wu Seong [2 ]
Kim, Kyung Won [3 ]
Lee, Hooseok [1 ]
Park, Chul [4 ]
Song, Hyun-Ok [5 ]
Choi, Tae-Young [6 ]
Seo, Jae Ho [7 ]
Lee, Jinseok [1 ]
机构
[1] Catholic Univ Korea, Dept Artificial Intelligence, 43 Jibong Ro, Bucheon 14662, South Korea
[2] Cheju Halla Gen Hosp, Jeju Reg Trauma Ctr, Dept Trauma Surg, Jeju, South Korea
[3] Univ Ulsan, Asan Med Ctr, Radiol & Res Inst Radiol, Coll Med, Seoul, South Korea
[4] Wonkwang Univ, Dept Internal Med, Sch Med, Iksan, South Korea
[5] Wonkwang Univ, Dept Infect Biol, Sch Med, Iksan, South Korea
[6] Wonkwang Univ, Dept Pathol, Sch Med, Iksan, South Korea
[7] Wonkwang Univ, Dept Biochem, Sch Med, Iksan, South Korea
基金
新加坡国家研究基金会;
关键词
COVID-19; artificial intelligence; blood samples; mortality prediction;
D O I
10.2196/27060
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: The number of deaths from COVID-19 continues to surge worldwide. In particular, if a patient's condition is sufficiently severe to require invasive ventilation, it is more likely to lead to death than to recovery. Objective: The goal of our study was to analyze the factors related to COVID-19 severity in patients and to develop an artificial intelligence (AI) model to predict the severity of COVID-19 at an early stage. Methods: We developed an AI model that predicts severity based on data from 5601 COVID-19 patients from all national and regional hospitals across South Korea as of April 2020. The clinical severity of COVID-19 was divided into two categories: low and high severity. The condition of patients in the low-severity group corresponded to no limit of activity, oxygen support with nasal prong or facial mask, and noninvasive ventilation. The condition of patients in the high-severity group corresponded to invasive ventilation, multi-organ failure with extracorporeal membrane oxygenation required, and death. For the AI model input, we used 37 variables from the medical records, including basic patient information, a physical index, initial examination findings, clinical findings, comorbid diseases, and general blood test results at an early stage. Feature importance analysis was performed with AdaBoost, random forest, and eXtreme Gradient Boosting (XGBoost); the AI model for predicting COVID-19 severity among patients was developed with a 5-layer deep neural network (DNN) with the 20 most important features, which were selected based on ranked feature importance analysis of 37 features from the comprehensive data set. The selection procedure was performed using sensitivity, specificity, accuracy, balanced accuracy, and area under the curve (AUC). Results: We found that age was the most important factor for predicting disease severity, followed by lymphocyte level, platelet count, and shortness of breath or dyspnea. Our proposed 5-layer DNN with the 20 most important features provided high sensitivity (90.2%), specificity (90.4%), accuracy (90.4%), balanced accuracy (90.3%), and AUC (0.96). Conclusions: Our proposed AI model with the selected features was able to predict the severity of COVID-19 accurately. We also made a web application so that anyone can access the model. We believe that sharing the AI model with the public will be helpful in validating and improving its performance.
引用
收藏
页数:15
相关论文
共 30 条
[1]   Machine learning prediction for mortality of patients diagnosed with COVID-19: a nationwide Korean cohort study [J].
An, Chansik ;
Lim, Hyunsun ;
Kim, Dong-Wook ;
Chang, Jung Hyun ;
Choi, Yoon Jung ;
Kim, Seong Woo .
SCIENTIFIC REPORTS, 2020, 10 (01)
[2]   Detection of COVID-19 Infection from Routine Blood Exams with Machine Learning: A Feasibility Study [J].
Brinati, Davide ;
Campagner, Andrea ;
Ferrari, Davide ;
Locatelli, Massimo ;
Banfi, Giuseppe ;
Cabitza, Federico .
JOURNAL OF MEDICAL SYSTEMS, 2020, 44 (08)
[3]  
Brunese Luca, 2020, Procedia Comput Sci, V176, P2212, DOI 10.1016/j.procs.2020.09.258
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[6]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[7]  
COX DR, 1958, J R STAT SOC B, V20, P215
[8]   Truncated inception net: COVID-19 outbreak screening using chest X-rays [J].
Das, Dipayan ;
Santosh, K. C. ;
Pal, Umapada .
PHYSICAL AND ENGINEERING SCIENCES IN MEDICINE, 2020, 43 (03) :915-925
[9]  
Freund Y., 1996, Proceedings of the Ninth Annual Conference on Computational Learning Theory, P325, DOI 10.1145/238061.238163
[10]   Summary of Guidance for Public Health Strategies to Address High Levels of Community Transmission of SARS-CoV-2 and Related Deaths, December 2020 [J].
Honein, Margaret A. ;
Christie, Athalia ;
Rose, Dale A. ;
Brooks, John T. ;
Meaney-Delman, Dana ;
Cohn, Amanda ;
Sauber-Schatz, Erin K. ;
Walker, Allison ;
McDonald, L. Clifford ;
Liburd, Leandris C. ;
Hall, Jeffrey E. ;
Fry, Alicia M. ;
Hall, Aron J. ;
Gupta, Neil ;
Kuhnert, Wendi L. ;
Yoon, Paula W. ;
Gundlapalli, Adi, V ;
Beach, Michael J. ;
Walke, Henry T. .
MMWR-MORBIDITY AND MORTALITY WEEKLY REPORT, 2020, 69 (49) :1860-1867