Towards automated phenotype definition extraction using large language models

被引:0
作者
Ramya Tekumalla [1 ]
Juan M. Banda [2 ]
机构
[1] Mercer University, Atlanta, GA
[2] Stanford Health Care, Stanford, CA
[3] Observational Health Data Sciences and Informatics, New York, NY
关键词
ChatGPT; Electronic phenotyping; Evaluation; Large language models (LLMs);
D O I
10.1186/s44342-024-00023-2
中图分类号
学科分类号
摘要
Electronic phenotyping involves a detailed analysis of both structured and unstructured data, employing rule-based methods, machine learning, natural language processing, and hybrid approaches. Currently, the development of accurate phenotype definitions demands extensive literature reviews and clinical experts, rendering the process time-consuming and inherently unscalable. Large language models offer a promising avenue for automating phenotype definition extraction but come with significant drawbacks, including reliability issues, the tendency to generate non-factual data (“hallucinations”), misleading results, and potential harm. To address these challenges, our study embarked on two key objectives: (1) defining a standard evaluation set to ensure large language models outputs are both useful and reliable and (2) evaluating various prompting approaches to extract phenotype definitions from large language models, assessing them with our established evaluation task. Our findings reveal promising results that still require human evaluation and validation for this task. However, enhanced phenotype extraction is possible, reducing the amount of time spent in literature review and evaluation. © The Author(s) 2024.
引用
收藏
相关论文
共 23 条
  • [1] Banda J.M., Seneviratne M., Hernandez-Boussard T., Shah N.H., Advances in electronic phenotyping: from rule-based definitions to machine learning models, Annu Rev Biomed Data Sci, 1, pp. 53-68, (2018)
  • [2] Smoller J.W., The use of electronic health records for psychiatric phenotyping and genomics, Am J Med Genet B Neuropsychiatr Genet, 177, pp. 601-612, (2018)
  • [3] Nadkarni G.N., Gottesman O., Linneman J.G., Et al., Development and validation of an electronic phenotyping algorithm for chronic kidney disease, AMIA Annu Symp Proc, 2014, pp. 907-916, (2014)
  • [4] Weng C., Shah N.H., Hripcsak G., Deep phenotyping: embracing complexity and temporality-towards scalability, portability, and interoperability, J Biomed Inform, 105, (2020)
  • [5] Huckvale K., Venkatesh S., Christensen H., Toward clinical digital phenotyping: a timely opportunity to consider purpose, quality, and safety, NPJ Digit Med, 2, (2019)
  • [6] Rasmussen L.V., Brandt P.S., Jiang G., Et al., Considerations for improving the portability of electronic health record-based phenotype algorithms, AMIA Annu Symp Proc, 2019, pp. 755-764, (2019)
  • [7] Agarwal V., Podchiyska T., Banda J.M., Goel V., Leung T.I., Minty E.P., Sweeney T.E., Gyang E., Shah N.H., Learning statistical models of phenotypes using noisy labeled training data, J Am Med Inform Assoc, 23, pp. 1166-1173, (2016)
  • [8] Yang Z., Dehmer M., Yli-Harja O., Emmert-Streib F., Combining deep learning with token selection for patient phenotyping from electronic health records, Sci Rep, 10, (2020)
  • [9] Semi-supervised learning of the electronic health record for phenotype stratification, J Biomed Inform, 64, pp. 168-178, (2016)
  • [10] Luo L., Yan S., Lai P.-T., Veltri D., Oler A., Xirasagar S., Ghosh R., Similuk M., Robinson P.N., Lu Z., PhenoTagger: a hybrid method for phenotype concept recognition using human phenotype ontology, Bioinformatics, 37, pp. 1884-1890, (2021)