Prevalence and Predictability of Low-Yield Inpatient Laboratory Diagnostic Tests

被引:29
作者
Xu, Song [1 ]
Hom, Jason [2 ]
Balasubramanian, Santhosh [1 ]
Schroeder, Lee F. [3 ]
Najafi, Nader [4 ]
Roy, Shivaal [5 ]
Chen, Jonathan H. [1 ,2 ]
机构
[1] Stanford Univ, Dept Med, Ctr Biomed Informat Res, 1265 Welch Rd,Med Sch Off Bldg X213, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Med, Div Hosp Med, Stanford, CA 94305 USA
[3] Univ Michigan, Sch Med, Dept Pathol, Ann Arbor, MI USA
[4] Univ Calif San Francisco, Dept Med, San Francisco, CA 94143 USA
[5] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
基金
美国国家卫生研究院;
关键词
HOSPITAL-ACQUIRED ANEMIA; CARE; INTERVENTION; PREDICTION; REDUCTION; IMPACT;
D O I
10.1001/jamanetworkopen.2019.10967
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
IMPORTANCE Laboratory testing is an important target for high-value care initiatives, constituting the highest volume of medical procedures. Prior studies have found that up to half of all inpatient laboratory tests may be medically unnecessary, but a systematic method to identify these unnecessary tests in individual cases is lacking. OBJECTIVE To systematically identify low-yield inpatient laboratory testing through personalized predictions. DESIGN, SETTING, AND PARTICIPANTS In this retrospective diagnostic study with multivariable prediction models, 116 637 inpatients treated at Stanford University Hospital from January 1, 2008, to December 31, 2017, a total of 60 929 inpatients treated at University of Michigan from January 1, 2015, to December 31, 2018, and 13 940 inpatients treated at the University of California, San Francisco from January 1 to December 31, 2018, were assessed. MAIN OUTCOMES AND MEASURES Diagnostic accuracy measures, including sensitivity, specificity, negative predictive values (NPVs), positive predictive values (PPVs), and area under the receiver operating characteristic curve (AUROC), of machine learning models when predicting whether inpatient laboratory tests yield a normal result as defined by local laboratory reference ranges. RESULTS In the recent data sets (July 1, 2014, to June 30, 2017) from Stanford University Hospital (including 22 664 female inpatients with a mean [SD] age of 58.8 [19.0] years and 22 016 male inpatients with a mean [SD] age of 59.0 [18.1] years), among the top 20 highest-volume tests, 792 397 were repeats of orders within 24 hours, including tests that are physiologically unlikely to yield new information that quickly (eg, white blood cell differential, glycated hemoglobin, and serum albumin level). The best-performing machine learning models predicted normal results with an AUROC of 0.90 or greater for 12 stand-alone laboratory tests (eg, sodium AUROC, 0.92 [95% CI, 0.91-0.93]; sensitivity, 98%; specificity, 35%; PPV, 66%; NPV, 93%; lactate dehydrogenase AUROC, 0.93 [95% CI, 0.93-0.94]; sensitivity, 96%; specificity, 65%; PPV, 71%; NPV, 95%; and troponin I AUROC, 0.92 [95% CI, 0.91-0.93]; sensitivity, 88%; specificity, 79%; PPV, 67%; NPV, 93%) and 10 common laboratory test components (eg, hemoglobin AUROC, 0.94 [95% CI, 0.92-0.95]; sensitivity, 99%; specificity, 17%; PPV, 90%; NPV, 81%; creatinine AUROC, 0.96 [95% CI, 0.96-0.97]; sensitivity, 93%; specificity, 83%; PPV, 79%; NPV, 94%; and urea nitrogen AUROC, 0.95 [95% CI, 0.94, 0.96]; sensitivity, 87%; specificity, 89%; PPV, 77%; NPV 94%). CONCLUSIONS AND RELEVANCE The findings suggest that low-yield diagnostic testing is common and can be systematically identified through data-driven methods and patient context-aware predictions. Implementing machine learning models appear to be able to quantify the level of uncertainty and expected information gained from diagnostic tests explicitly, with the potential to encourage useful testing and discourage low-value testing that incurs direct costs and indirect harms.
引用
收藏
页数:13
相关论文
共 47 条
  • [1] Blood Cultures for Community-Acquired Pneumonia: Are They Worthy of Two Quality Measures? A Systematic Review
    Afshar, Nima
    Tabas, Jeffrey
    Afshar, Kia
    Silbergleit, Robert
    [J]. JOURNAL OF HOSPITAL MEDICINE, 2009, 4 (02) : 112 - 123
  • [2] Aikens Rachael C, 2019, AMIA Jt Summits Transl Sci Proc, V2019, P515
  • [3] Eliminating Creatine Kinase-Myocardial Band Testing in Suspected Acute Coronary Syndrome A Value-Based Quality Improvement
    Alvin, Matthew D.
    Jaffe, Allan S.
    Ziegelstein, Roy C.
    Trost, Jeffrey C.
    [J]. JAMA INTERNAL MEDICINE, 2017, 177 (10) : 1508 - 1512
  • [4] Effects of workload, work complexity, and repeated alerts on alert fatigue in a clinical decision support system
    Ancker, Jessica S.
    Edwards, Alison
    Nosal, Sarah
    Hauser, Diane
    Mauer, Elizabeth
    Kaushal, Rainu
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2017, 17
  • [5] [Anonymous], 2009, ELEMENTS, DOI DOI 10.1007/B94608
  • [6] Clinical reminder alert fatigue in healthcare: a systematic literature review protocol using qualitative evidence
    Backman, Ruth
    Bayliss, Susan
    Moore, David
    Litchfield, Ian
    [J]. SYSTEMATIC REVIEWS, 2017, 6
  • [7] Big Data and Machine Learning in Health Care
    Beam, Andrew L.
    Kohane, Isaac S.
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2018, 319 (13): : 1317 - 1318
  • [8] Machine learning in laboratory medicine: waiting for the flood?
    Cabitza, Federico
    Banfi, Giuseppe
    [J]. CLINICAL CHEMISTRY AND LABORATORY MEDICINE, 2018, 56 (04) : 516 - 524
  • [9] Machine Learning and Prediction in Medicine - Beyond the Peak of Inflated Expectations
    Chen, Jonathan H.
    Asch, Steven M.
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2017, 376 (26) : 2507 - 2509
  • [10] Why Providers Transfuse Blood Products Outside Recommended Guidelines in Spite of Integrated Electronic Best Practice Alerts
    Chen, Jonathan H.
    Fang, Daniel Z.
    Goodnough, Lawrence Tim
    Evans, Kambria H.
    Porter, Martina Lee
    Shieh, Lisa
    [J]. JOURNAL OF HOSPITAL MEDICINE, 2015, 10 (01) : 1 - 7