Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study

被引:42
作者
Munsch, Nicolas [1 ]
Martin, Alistair [1 ]
Gruarin, Stefanie [2 ]
Nateqi, Jama [2 ,3 ]
Abdarahmane, Isselmou [1 ]
Weingartner-Ortner, Rafael [1 ,2 ]
Knapp, Bernhard [1 ]
机构
[1] Symptoma, Data Sci Dept, Landstr Gurtel 3, A-1030 Vienna, Austria
[2] Symptoma, Med Dept, Attersee, Austria
[3] Paracelsus Med Univ, Dept Internal Med, Salzburg, Austria
基金
欧盟地平线“2020”;
关键词
COVID-19; symptom checkers; benchmark; digital health; symptom; chatbot; accuracy;
D O I
10.2196/21299
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: A large number of web-based COVID-19 symptom checkers and chatbots have been developed; however, anecdotal evidence suggests that their conclusions are highly variable. To our knowledge, no study has evaluated the accuracy of COVID-19 symptom checkers in a statistically rigorous manner. Objective: The aim of this study is to evaluate and compare the diagnostic accuracies of web-based COVID-19 symptom checkers. Methods: We identified 10 web-based COVID-19 symptom checkers, all of which were included in the study. We evaluated the COVID-19 symptom checkers by assessing 50 COVID-19 case reports alongside 410 non-COVID-19 control cases. A bootstrapping method was used to counter the unbalanced sample sizes and obtain confidence intervals (CIs). Results are reported as sensitivity, specificity, F1 score, and Matthews correlation coefficient (MCC). Results: The classification task between COVID-19-positive and COVID-19-negative for "high risk" cases among the 460 test cases yielded (sorted by F1 score): Symptoma (F1=0.92, MCC=0.85), Infermedica (F1=0.80, MCC=0.61), US Centers for Disease Control and Prevention (CDC) (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Cleveland Clinic (F1=0.40, MCC=0.07), Providence (F1=0.40, MCC=0.05), Apple (F1=0.29, MCC=-0.10), Docyet (F1=0.27, MCC=0.29), Ada (F1=0.24, MCC=0.27) and Your.MD (F1=0.24, MCC=0.27). For "high risk" and "medium risk" combined the performance was: Symptoma (F1=0.91, MCC=0.83) Infermedica (F1=0.80, MCC=0.61), Cleveland Clinic (F1=0.76, MCC=0.47), Providence (F1=0.75, MCC=0.45), Your.MD (F1=0.72, MCC=0.33), CDC (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Apple (F1=0.70, MCC=0.25), Ada (F1=0.42, MCC=0.03), and Docyet (F1=0.27, MCC=0.29). Conclusions: We found that the number of correctly assessed COVID-19 and control cases varies considerably between symptom checkers, with different symptom checkers showing different strengths with respect to sensitivity and specificity. A good balance between sensitivity and specificity was only achieved by two symptom checkers.
引用
收藏
页数:8
相关论文
共 24 条
[1]   Effectiveness of workplace social distancing measures in reducing influenza transmission: a systematic review [J].
Ahmed, Faruque ;
Zviedrite, Nicole ;
Uzicanin, Amra .
BMC PUBLIC HEALTH, 2018, 18
[2]  
[Anonymous], COVID 19 SCREEN
[3]  
[Anonymous], COR SELF CHECK
[4]  
[Anonymous], SYMPT COR
[5]  
[Anonymous], COR ASS TOOL
[6]  
[Anonymous], COVID 19 SCREEN TOOL
[7]   Accuracy of a Computer-Based Diagnostic Program for Ambulatory Patients With Knee Pain [J].
Bisson, Leslie J. ;
Komm, Jorden T. ;
Bernas, Geoffrey A. ;
Fineberg, Marc S. ;
Marzo, John M. ;
Rauh, Michael A. ;
Smolinski, Robert J. ;
Wind, William M. .
AMERICAN JOURNAL OF SPORTS MEDICINE, 2014, 42 (10) :2371-2376
[8]   Quantifying social distancing arising from pandemic influenza [J].
Caley, Peter ;
Philp, David J. ;
McCracken, Kevin .
JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2008, 5 (23) :631-639
[9]  
Chambers D., 2019, HLTH SERV DELIV RES, V7, P1, DOI [10.3310/hsdr07290, DOI 10.3310/HSDR07290]
[10]  
China-WHO Expert Team, 2020, Report of the WHO-China Joint Mission on Coronavirus Disease 2019 (COVID-19)