Addressing Commonly Asked Questions in Urogynecology: Accuracy and Limitations of ChatGPT

被引:0
作者
Vurture, Gregory [1 ]
Jenkins, Nicole [2 ]
Ross, James [3 ]
Sansone, Stephanie [3 ]
Conner, Ellen [3 ]
Jacobson, Nina [3 ]
Smilen, Scott [3 ]
Baum, Jonathan [2 ]
机构
[1] Albert Einstein Coll Med, Montefiore Med Ctr, Dept Urol, Div Urogynecol, 1250 Waters Pl,Tower Two,9th Floor, Bronx, NY 10460 USA
[2] Hackensack Meridian Hlth Jersey Shore Univ Med Ct, Dept Obstet & Gynecol, Neptune City, NJ USA
[3] Hackensack Meridian Hlth Jersey Shore Univ Med Ct, Dept Obstet & Gynecol, Div Urogynecol, Neptune City, NJ USA
关键词
Artificial intelligence; Large language model; Machine learning;
D O I
10.1007/s00192-025-06184-0
中图分类号
R71 [妇产科学];
学科分类号
100211 ;
摘要
Introduction and HypothesisExisting literature suggests that large language models such as Chat Generative Pre-training Transformer (ChatGPT) might provide inaccurate and unreliable health care information. The literature regarding its performance in urogynecology is scarce. The aim of the present study is to assess ChatGPT's ability to accurately answer commonly asked urogynecology patient questions.MethodsAn expert panel of five board certified urogynecologists and two fellows developed ten commonly asked patient questions in a urogynecology office. Questions were phrased using diction and verbiage that a patient may use when asking a question over the internet. ChatGPT responses were evaluated using the Brief DISCERN (BD) tool, a validated scoring system for online health care information. Scores >= 16 are consistent with good-quality content. Responses were graded based on their accuracy and consistency with expert opinion and published guidelines.ResultsThe average score across all ten questions was 18.9 +/- 2.7. Nine out of ten (90%) questions had a response that was determined to be of good quality (BD >= 16). The lowest scoring topic was "Pelvic Organ Prolapse" (mean BD = 14.0 +/- 2.0). The highest scoring topic was "Interstitial Cystitis" (mean BD = 22.0 +/- 0). ChatGPT provided no references for its responses.ConclusionsChatGPT provided high-quality responses to 90% of the questions based on an expert panel's review with the BD tool. Nonetheless, given the evolving nature of this technology, continued analysis is crucial before ChatGPT can be accepted as accurate and reliable.
引用
收藏
页数:6
相关论文
共 17 条
[1]   Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments [J].
Brin, Dana ;
Sorin, Vera ;
Vaid, Akhil ;
Soroush, Ali ;
Glicksberg, Benjamin S. ;
Charney, Alexander W. ;
Nadkarni, Girish ;
Klang, Eyal .
SCIENTIFIC REPORTS, 2023, 13 (01)
[2]  
Channa L, 2023, AUA News Articles
[3]   ChatGPT in Urogynecology Research: Novel or Not? [J].
Choueka, David ;
Tabakin, Alexandra L. ;
Shalom, Dara F. .
UROGYNECOLOGY, 2024, 30 (12) :962-967
[4]  
Cocci A, 2024, PROSTATE CANCER P D, V27, P103, DOI 10.1038/s41391-023-00705-y
[5]   Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology [J].
Dhanvijay, Anup Kumar D. ;
Pinjar, Mohammed Jaffer ;
Dhokane, Nitin ;
Sorte, Smita R. ;
Kumari, Amita ;
Mondal, Himel .
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
[6]   ChatGPT's Response Consistency: A Study on Repeated Queries of Medical Examination Questions [J].
Funk, Paul F. ;
Hoch, Cosima C. ;
Knoedler, Samuel ;
Knoedler, Leonard ;
Cotofana, Sebastian ;
Sofo, Giuseppe ;
Bashiri Dezfouli, Ali ;
Wollenberg, Barbara ;
Guntinas-Lichius, Orlando ;
Alfertshofer, Michael .
EUROPEAN JOURNAL OF INVESTIGATION IN HEALTH PSYCHOLOGY AND EDUCATION, 2024, 14 (03) :657-668
[7]   The exciting potential for ChatGPT in obstetrics and gynecology [J].
Grunebaum, Amos ;
Chervenak, Joseph ;
Pollet, Susan L. ;
Katz, Adi ;
Chervenak, Frank A. .
AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2023, 228 (06) :696-705
[8]   Evaluation of ChatGPT for Pelvic Floor Surgery Counseling [J].
Johnson, Colin M. ;
Bradley, Catherine S. ;
Kenne, Kimberly A. ;
Rabice, Sarah ;
Takacs, Elizabeth ;
Vollstedt, Annah ;
Kowalski, Joseph T. .
UROGYNECOLOGY, 2024, 30 (03) :245-250
[9]   Ethical Concerns About ChatGPT in Healthcare: A Useful Tool or the Tombstone of Original and Reflective Thinking? [J].
Kapsali, Marina Z. ;
Livanis, Efstratios ;
Tsalikidis, Christos ;
Oikonomou, Panagoula ;
Voultsos, Polychronis ;
Tsaroucha, Aleka .
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (02)
[10]   Brief DISCERN, six questions for the evaluation of evidence-based content of health-related websites [J].
Khazaal, Yasser ;
Chatton, Anne ;
Cochand, Sophie ;
Coquard, Olivier ;
Fernandez, Sebastien ;
Khan, Riaz ;
Billieux, Joel ;
Zullino, Daniele .
PATIENT EDUCATION AND COUNSELING, 2009, 77 (01) :33-37