Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study

被引：42

作者：

Munsch, Nicolas ^{[1
]}

Martin, Alistair ^{[1
]}

Gruarin, Stefanie ^{[2
]}

Nateqi, Jama ^{[2
,3
]}

Abdarahmane, Isselmou ^{[1
]}

Weingartner-Ortner, Rafael ^{[1
,2
]}

Knapp, Bernhard ^{[1
]}

机构：

[1] Symptoma, Data Sci Dept, Landstr Gurtel 3, A-1030 Vienna, Austria

[2] Symptoma, Med Dept, Attersee, Austria

[3] Paracelsus Med Univ, Dept Internal Med, Salzburg, Austria

来源：

JOURNAL OF MEDICAL INTERNET RESEARCH | 2020年 / 22卷 / 10期

基金：

欧盟地平线“2020”;

关键词：

COVID-19; symptom checkers; benchmark; digital health; symptom; chatbot; accuracy;

D O I：

10.2196/21299

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Background: A large number of web-based COVID-19 symptom checkers and chatbots have been developed; however, anecdotal evidence suggests that their conclusions are highly variable. To our knowledge, no study has evaluated the accuracy of COVID-19 symptom checkers in a statistically rigorous manner. Objective: The aim of this study is to evaluate and compare the diagnostic accuracies of web-based COVID-19 symptom checkers. Methods: We identified 10 web-based COVID-19 symptom checkers, all of which were included in the study. We evaluated the COVID-19 symptom checkers by assessing 50 COVID-19 case reports alongside 410 non-COVID-19 control cases. A bootstrapping method was used to counter the unbalanced sample sizes and obtain confidence intervals (CIs). Results are reported as sensitivity, specificity, F1 score, and Matthews correlation coefficient (MCC). Results: The classification task between COVID-19-positive and COVID-19-negative for "high risk" cases among the 460 test cases yielded (sorted by F1 score): Symptoma (F1=0.92, MCC=0.85), Infermedica (F1=0.80, MCC=0.61), US Centers for Disease Control and Prevention (CDC) (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Cleveland Clinic (F1=0.40, MCC=0.07), Providence (F1=0.40, MCC=0.05), Apple (F1=0.29, MCC=-0.10), Docyet (F1=0.27, MCC=0.29), Ada (F1=0.24, MCC=0.27) and Your.MD (F1=0.24, MCC=0.27). For "high risk" and "medium risk" combined the performance was: Symptoma (F1=0.91, MCC=0.83) Infermedica (F1=0.80, MCC=0.61), Cleveland Clinic (F1=0.76, MCC=0.47), Providence (F1=0.75, MCC=0.45), Your.MD (F1=0.72, MCC=0.33), CDC (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Apple (F1=0.70, MCC=0.25), Ada (F1=0.42, MCC=0.03), and Docyet (F1=0.27, MCC=0.29). Conclusions: We found that the number of correctly assessed COVID-19 and control cases varies considerably between symptom checkers, with different symptom checkers showing different strengths with respect to sensitivity and specificity. A good balance between sensitivity and specificity was only achieved by two symptom checkers.

引用

页数：8

共 24 条

[1] Effectiveness of workplace social distancing measures in reducing influenza transmission: a systematic review [J].

Ahmed, Faruque ;

Zviedrite, Nicole ;

Uzicanin, Amra .

BMC PUBLIC HEALTH, 2018, 18

[2]

[Anonymous], COVID 19 SCREEN

[3]

[Anonymous], COR SELF CHECK

[4]

[Anonymous], SYMPT COR

[5]

[Anonymous], COR ASS TOOL

[6]

[Anonymous], COVID 19 SCREEN TOOL

[7] Accuracy of a Computer-Based Diagnostic Program for Ambulatory Patients With Knee Pain [J].

Bisson, Leslie J. ;

Komm, Jorden T. ;

Bernas, Geoffrey A. ;

Fineberg, Marc S. ;

Marzo, John M. ;

Rauh, Michael A. ;

Smolinski, Robert J. ;

Wind, William M. .

AMERICAN JOURNAL OF SPORTS MEDICINE, 2014, 42 (10) :2371-2376

[8] Quantifying social distancing arising from pandemic influenza [J].

Caley, Peter ;

Philp, David J. ;

McCracken, Kevin .

JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2008, 5 (23) :631-639

[9]

Chambers D., 2019, HLTH SERV DELIV RES, V7, P1, DOI [10.3310/hsdr07290, DOI 10.3310/HSDR07290]

[10]

China-WHO Expert Team, 2020, Report of the WHO-China Joint Mission on Coronavirus Disease 2019 (COVID-19)

← 1 2 3 →