Evaluating the positive predictive value of code-based identification of cirrhosis and its complications utilizing GPT-4

被引:0
作者
Far, Aryana T. [1 ]
Bastani, Asal [1 ]
Lee, Albert [2 ,3 ]
Gologorskaya, Oksana [2 ,3 ]
Huang, Chiung-Yu [4 ]
Pletcher, Mark J. [4 ]
Lai, Jennifer C. [1 ]
Ge, Jin [1 ]
机构
[1] Univ Calif San Francisco, Dept Med, Div Gastroenterol & Hepatol, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, Acad Res Serv, San Francisco, CA 94143 USA
[3] Univ Calif San Francisco, Bakar Computat Hlth Sci Inst, San Francisco, CA 94143 USA
[4] Univ Calif San Francisco, Dept Epidemiol & Biostat, San Francisco, CA 94143 USA
关键词
cirrhosis; cohort identification; large language models (LLMs); natural language processing; CHRONIC LIVER-DISEASE; DATABASES; PROGNOSIS; OUTCOMES; CHILD; MODEL;
D O I
10.1097/HEP.0000000000001115
中图分类号
R57 [消化系及腹部疾病];
学科分类号
摘要
Background and Aims:Diagnosis code classification is a common method for cohort identification in cirrhosis research, but it is often inaccurate and augmented by labor-intensive chart review. Natural language processing using large language models (LLMs) is a potentially more accurate method. To assess LLMs' potential for cirrhosis cohort identification, we compared code-based versus LLM-based classification with chart review as a "gold standard."Approach and Results:We extracted and conducted a limited chart review of 3788 discharge summaries of cirrhosis admissions. We engineered zero-shot prompts using a Generative Pre-trained Transformer 4 to determine whether cirrhosis and its complications were active hospitalization problems. We calculated positive predictive values (PPVs) of LLM-based classification versus limited chart review and PPVs of code-based versus LLM-based classification as a "silver standard" in all 3788 summaries. Compared to gold standard chart review, code-based classification achieved PPVs of 82.2% for identifying cirrhosis, 41.7% for HE, 72.8% for ascites, 59.8% for gastrointestinal bleeding, and 48.8% for spontaneous bacterial peritonitis. Compared to the chart review, Generative Pre-trained Transformer 4 achieved 87.8%-98.8% accuracies for identifying cirrhosis and its complications. Using LLM as a silver standard, code-based classification achieved PPVs of 79.8% for identifying cirrhosis, 53.9% for HE, 55.3% for ascites, 67.6% for gastrointestinal bleeding, and 65.5% for spontaneous bacterial peritonitis.Conclusions:LLM-based classification was highly accurate versus manual chart review in identifying cirrhosis and its complications. This allowed us to assess the performance of code-based classification at scale using LLMs as a silver standard. These results suggest LLMs could augment or replace code-based cohort classification and raise questions regarding the necessity of chart review.
引用
收藏
页数:12
相关论文
共 38 条
  • [1] Allison J J, 2000, Jt Comm J Qual Improv, V26, P115
  • [2] Anthropic, Introducing Claude
  • [3] Validity of administrative codes associated with cirrhosis in Sweden
    Bengtsson, Bonnie
    Askling, Johan
    Ludvigsson, Jonas F.
    Hagstrom, Hannes
    [J]. SCANDINAVIAN JOURNAL OF GASTROENTEROLOGY, 2020, 55 (10) : 1205 - 1210
  • [4] Child C G, 1964, Major Probl Clin Surg, V1, P1
  • [5] Using administrative databases for outcomes research: Select examples from VA Health Services Research and Development
    Cowper D.C.
    Hynes D.M.
    Kubal J.D.
    Murphy P.A.
    [J]. Journal of Medical Systems, 1999, 23 (3) : 249 - 259
  • [6] Assessment of the prognosis of cirrhosis: Child-Pugh versus MELD
    Durand, F
    Valla, D
    [J]. JOURNAL OF HEPATOLOGY, 2005, 42 : S100 - S107
  • [7] Transcription Error Rates in Retrospective Chart Reviews
    Feng, James E.
    Anoushiravani, Afshin A.
    Tesoriero, Paul J.
    Ani, Lidia
    Meftah, Morteza
    Schwarzkopf, Ran
    Leucht, Philipp
    [J]. ORTHOPEDICS, 2020, 43 (05) : E404 - E408
  • [8] Predicting the prognosis of chronic liver disease: An evolution from child to MELD
    Forman, LM
    Lucey, MR
    [J]. HEPATOLOGY, 2001, 33 (02) : 473 - 475
  • [9] Use of administrative medical databases in population-based research
    Gavrielov-Yusim, Natalie
    Friger, Michael
    [J]. JOURNAL OF EPIDEMIOLOGY AND COMMUNITY HEALTH, 2014, 68 (03) : 283 - 287
  • [10] A Comparison of a Large Language Model vs Manual Chart Review for the Extraction of Data Elements From the Electronic Health Record
    Ge, Jin
    Li, Michael
    Delk, Molly B.
    Lai, Jennifer C.
    [J]. GASTROENTEROLOGY, 2024, 166 (04)