Comparative Accuracy of Diagnosis by Collective Intelligence of Multiple Physicians vs Individual Physicians

被引:104
作者
Barnett, Michael L. [1 ,2 ,3 ]
Boddupalli, Dhruv [4 ]
Nundy, Shantanu [5 ]
Bates, David W. [1 ,2 ,3 ]
机构
[1] Harvard TH Chan Sch Publ Hlth, Dept Hlth Policy & Management, 677 Huntington Ave,Kresge 411, Boston, MA 02115 USA
[2] Brigham & Womens Hosp, Dept Med, Div Gen Internal Med & Primary Care, 75 Francis St, Boston, MA 02115 USA
[3] Harvard Med Sch, Dept Med, Boston, MA 02115 USA
[4] Univ Calif San Francisco, Dept Med, San Francisco, CA USA
[5] George Washington Univ, Milken Inst Sch Publ Hlth, Washington, DC USA
关键词
PERFORMANCE;
D O I
10.1001/jamanetworkopen.2019.0096
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
IMPORTANCE The traditional approach of diagnosis by individual physicians has a high rate of misdiagnosis. Pooling multiple physicians' diagnoses (collective intelligence) is a promising approach to reducing misdiagnoses, but its accuracy in clinical cases is unknown to date. OBJECTIVE To assess how the diagnostic accuracy of groups of physicians and trainees compares with the diagnostic accuracy of individual physicians. DESIGN, SETTING, AND PARTICIPANTS Cross-sectional study using data from the Human Diagnosis Project (Human Dx), a multicountry data set of ranked differential diagnoses by individual physicians, graduate trainees, and medical students (users) solving user-submitted, structured clinical cases. From May 7, 2014, to October 5, 2016, groups of 2 to 9 randomly selected physicians solved individual cases. Data analysis was performed from March 16, 2017, to July 30, 2018. MAIN OUTCOMES AND MEASURES The primary outcome was diagnostic accuracy, assessed as a correct diagnosis in the top 3 ranked diagnoses for an individual; for groups, the top 3 diagnoses were a collective differential generated using a weighted combination of user diagnoses with a variety of approaches. A version of the McNemar test was used to account for clustering across repeated solvers to compare diagnostic accuracy. RESULTS Of the 2069 users solving 1572 cases from the Human Dx data set, 1228 (59.4%) were residents or fellows, 431 (20.8%) were attending physicians, and 410 (19.8%) were medical students. Collective intelligence was associated with increasing diagnostic accuracy, from 62.5%(95% CI, 60.1%-64.9%) for individual physicians up to 85.6%(95% CI, 83.9%-87.4%) for groups of 9 (23.0% difference; 95% CI, 14.9%-31.2%; P<.001). The range of improvement varied by the specifications used for combining groups' diagnoses, but groups consistently outperformed individuals regardless of approach. Absolute improvement in accuracy from individuals to groups of 9 varied by presenting symptom from an increase of 17.3%(95% CI, 6.4%-28.2%; P=.002) for abdominal pain to 29.8% (95% CI, 3.7%-55.8%; P=.02) for fever. Groups from 2 users (77.7% accuracy; 95% CI, 70.1%-84.6%) to 9 users (85.5% accuracy; 95% CI, 75.1%-95.9%) outperformed individual specialists in their subspecialty (66.3% accuracy; 95% CI, 59.1%-73.5%; P<.001 vs groups of 2 and 9). CONCLUSIONS AND RELEVANCE A collective intelligence approach was associated with higher diagnostic accuracy compared with individuals, including individual specialists whose expertise matched the case diagnosis, across a range of medical cases. Given the few proven strategies to address misdiagnosis, this technique merits further study in clinical settings.
引用
收藏
页数:11
相关论文
共 25 条
[1]  
[Anonymous], FITTING LINEAR MIXED
[2]   Economics - The promise of prediction markets [J].
Arrow, Kenneth J. ;
Forsythe, Robert ;
Gorham, Michael ;
Hahn, Robert ;
Hanson, Robin ;
Ledyard, John O. ;
Levmore, Saul ;
Litan, Robert ;
Milgrom, Paul ;
Nelson, Forrest D. ;
Neumann, George R. ;
Ottaviani, Marco ;
Schelling, Thomas C. ;
Shiller, Robert J. ;
Smith, Vernon L. ;
Snowberg, Erik ;
Sunstein, Cass R. ;
Tetlock, Paul C. ;
Tetlock, Philip E. ;
Varian, Hal R. ;
Wolfers, Justin ;
Zitzewitz, Eric .
SCIENCE, 2008, 320 (5878) :877-878
[3]   Variability in the interpretation of screening mammograms by US radiologists - Findings from a national sample [J].
Beam, CA ;
Layde, PM ;
Sullivan, DC .
ARCHIVES OF INTERNAL MEDICINE, 1996, 156 (02) :209-213
[4]   In 2011 Nearly One-Third Of Physicians Said They Would Not Accept New Medicaid Patients, But Rising Fees May Help [J].
Decker, Sandra L. .
HEALTH AFFAIRS, 2012, 31 (08) :1673-1679
[5]   The incidence of diagnostic error in medicine [J].
Graber, Mark L. .
BMJ QUALITY & SAFETY, 2013, 22 :ii21-ii27
[6]   Diagnostic Performance by Medical Students Working Individually or in Teams [J].
Hautz, Wolf E. ;
Kaemmer, Juliane E. ;
Schauber, Stefan K. ;
Spies, Claudia D. ;
Gaissmaier, Wolfgang .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2015, 313 (03) :303-304
[7]   CONSENSUS METHODS FOR MEDICAL AND HEALTH-SERVICES RESEARCH [J].
JONES, J ;
HUNTER, D .
BRITISH MEDICAL JOURNAL, 1995, 311 (7001) :376-380
[8]   The Potential of Collective Intelligence in Emergency Medicine: Pooling Medical Students' Independent Decisions Improves Diagnostic Performance [J].
Kaemmer, Juliane E. ;
Hautz, Wolf E. ;
Herzog, Stefan M. ;
Kunina-Habenicht, Olga ;
Kurvers, Ralf H. J. M. .
MEDICAL DECISION MAKING, 2017, 37 (06) :715-724
[9]   When Are Two Heads Better than One and Why? [J].
Koriat, Asher .
SCIENCE, 2012, 336 (6079) :360-362
[10]   Boosting medical diagnostics by pooling independent judgments [J].
Kurvers, Ralf H. J. M. ;
Herzog, Stefan M. ;
Hertwig, Ralph ;
Krause, Jens ;
Carney, Patricia A. ;
Bogart, Andy ;
Argenziano, Giuseppe ;
Zalaudek, Iris ;
Wolf, Max .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (31) :8777-8782