Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI

被引：10

作者：

Mediboina, Anjali ^{[1
]}

Badam, Rajani Kumari ^{[2
]}

Chodavarapu, Sailaja ^{[3
]}

机构：

[1] Alluri Sita Ramaraju Acad Med Sci, Community Med, Eluru, India

[2] Sri Venkateswara Med Coll, Obstet & Gynaecol, Tirupati, India

[3] Govt Med Coll, Obstet & Gynaecol, Rajamahendravaram, India

来源：

CUREUS JOURNAL OF MEDICAL SCIENCE | 2024年 / 16卷 / 01期

关键词：

large language models; ethics; artificial intelligence; chatbots; patient information; medication abortion; google bardai; chatgpt; MIFEPRISTONE;

D O I：

10.7759/cureus.51544

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

Background and objective ChatGPT and Google Bard AI are widely used conversational chatbots, even in healthcare. While they have several strengths, they can generate seemingly correct but erroneous responses, warranting caution in medical contexts. In an era where access to abortion care is diminishing, patients may increasingly rely on online resources and AI-driven language models for information on medication abortions. In light of this, this study aimed to compare the accuracy and comprehensiveness of responses generated by ChatGPT 3.5 and Google Bard AI to medical queries about medication abortions. Methods Fourteen open-ended questions about medication abortion were formulated based on the Frequently Asked Questions (FAQs) from the National Abortion Federation (NAF) and the Reproductive Health Access Project (RHAP) websites. These questions were answered using ChatGPT version 3.5 and Google Bard AI on October 7, 2023. The accuracy of the responses was analyzed by cross-referencing the generated answers against the information provided by NAF and RHAP. Any discrepancies were further verified against the guidelines from the American Congress of Obstetricians and Gynecologists (ACOG). A rating scale used by Johnson et al. was employed for assessment, utilizing a 6-point Likert scale [ranging from 1 (completely incorrect) to 6 (correct)] to evaluate accuracy and a 3-point scale [ranging from 1 (incomplete) to 3 (comprehensive)] to assess completeness. Questions that did not yield answers were assigned a score of 0 and omitted from the correlation analysis. Data analysis and visualization were done using R Software version 4.3.1. Statistical significance was determined by employing Spearman's R and Mann-Whitney U tests. Results All questions were entered sequentially into both chatbots by the same author. On the initial attempt, ChatGPT successfully generated relevant responses for all questions, while Google Bard AI failed to provide answers for five questions. Repeating the same question in Google Bard AI yielded an answer for one; two were answered with different phrasing; and two remained unanswered despite rephrasing. ChatGPT showed a median accuracy score of 5 (mean: 5.26, SD: 0.73) and a median completeness score of 3 (mean: 2.57, SD: 0.51). It showed the highest accuracy score in six responses and the highest completeness score in eight responses. In contrast, Google Bard AI had a median accuracy score of 5 (mean: 4.5, SD: 2.03) and a median completeness score of 2 (mean: 2.14, SD: 1.03). It achieved the highest accuracy score in five responses and the highest completeness score in six responses. Spearman's correlation coefficient revealed no correlation between accuracy and completeness for ChatGPT (rs =-0.46771, p = 0.09171). However, Google Bard AI showed a marginally significant correlation (rs = 0.5738, p = 0.05108). Mann-Whitney U test indicated no statistically significant differences between ChatGPT and Google Bard AI concerning accuracy (U = 82, p>0.05) or completeness (U = 78, p>0.05). Conclusion While both chatbots showed similar levels of accuracy, minor errors were noted, pertaining to finer aspects that demand specialized knowledge of abortion care. This could explain the lack of a significant correlation between accuracy and completeness. Ultimately, AI-driven language models have the potential to provide information on medication abortions, but there is a need for continual refinement and oversight.

引用

页数：8

共 23 条

[1]

Ahmed I., 2023, PREPRINT, DOI [10.36227/TECHRXIV.23536290.V2, DOI 10.36227/TECHRXIV.23536290.V2]

[2] Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations [J].

Ali, Rohaid ;

Tang, Oliver Y. ;

Connolly, Ian D. ;

Sullivan, Patricia L. Zadnik ;

Shin, John H. ;

Fridley, Jared S. ;

Asaad, Wael F. ;

Cielo, Deus ;

Oyelese, Adetokunbo A. ;

Doberstein, Curtis E. ;

Gokaslan, Ziya L. ;

Telfeian, Albert E. .

NEUROSURGERY, 2023, 93 (06) :1353-1365

[3]

[Anonymous], 2018, STATE WORLD FISHERIE, P210

[4] Telephone follow-up and self-performed urine pregnancy testing after early medical abortion: a service evaluation [J].

Cameron, Sharon T. ;

Glasier, Anna ;

Dewart, Helen ;

Johnstone, Anne ;

Burnside, Audrey .

CONTRACEPTION, 2012, 86 (01) :67-73

[5] Mifepristone-induced early abortion and outcome of subsequent wanted pregnancy [J].

Chen, AM ;

Yuan, W ;

Meirik, O ;

Wang, XM ;

Wu, SZ ;

Zhou, LF ;

Luo, L ;

Gao, EH ;

Cheng, YM .

AMERICAN JOURNAL OF EPIDEMIOLOGY, 2004, 160 (02) :110-117

[6] Artificial intelligence chatbots as sources of patient education material for obstructive sleep apnoea: ChatGPT versus Google Bard [J].

Cheong, Ryan Chin Taw ;

Unadkat, Samit ;

Mcneillis, Venkata ;

Williamson, Andrew ;

Joseph, Jonathan ;

Randhawa, Premjit ;

Andrews, Peter ;

Paleri, Vinidh .

EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, 281 (02) :985-993

[7] Toxic shock associated with Clostridium sordellii and Clostridium perfringens after medical and spontaneous abortion [J].

Cohen, Adam L. ;

Bhatnagar, Jufulu ;

Reagan, Sarah ;

Zane, Suzanne B. ;

D'Angeli, Marisa A. ;

Fischer, Marc ;

Killgore, George ;

Kwan-Gett, Tao Sheng ;

Blossom, David B. ;

Shieh, Wun-Ju ;

Guarner, Jeannette ;

Jernigan, John ;

Duchin, Jeffrey S. ;

Zaki, Sherif R. ;

McDonald, L. Clifford .

OBSTETRICS AND GYNECOLOGY, 2007, 110 (05) :1027-1033

[8] Medication Abortion Up to 70 Days of Gestation [J].

Creinin, Mitchell D. ;

Grossman, Daniel A. .

CONTRACEPTION, 2020, 102 (04) :225-236

[9] An Interesting Conversation with ChatGPT about Acne Vulgaris [J].

Deoghare, Shreya .

INDIAN DERMATOLOGY ONLINE JOURNAL, 2024, 15 (01) :137-140

[10] WHO multinational study of three misoprostol regimens after mifepristone for early medical abortion.: II:: Side effects and women's perceptions [J].

Honkanen, H ;

Piaggio, G ;

von Hertzen, H ;

Bártfai, G ;

Erdenetungalag, R ;

Gemzell-Danielsson, K ;

Gopalan, S ;

Horga, M ;

Jerve, F ;

Mittal, S ;

Ngoc, NTN ;

Peregoudov, A ;

Prasad, RNV ;

Pretnar-Darovec, A ;

Shah, RS ;

Song, S ;

Tang, OS ;

Wu, SC .

BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY, 2004, 111 (07) :715-725

← 1 2 3 →