Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain

被引:5
作者
Ozduran, Erkan [1 ]
Hanci, Volkan [2 ]
Erkin, Yueksel [3 ]
Ozbek, Ilhan Celil [4 ]
Abdulkerimov, Vugar [5 ]
机构
[1] Sivas Numune Hosp, Phys Med & Rehabil Pain Med, Sivas, Turkiye
[2] Dokuz Eylul Univ, Anesthesiol & Reanimat, Crit Care Med, Izmir, Turkiye
[3] Dokuz Eylul Univ, Anesthesiol & Reanimat, Pain Med, Izmir, Turkiye
[4] Hlth Sci Univ, Derince Educ & Res Hosp, Phys Med & Rehabil, Kocaeli, Turkiye
[5] Cent Clin Hosp, Anesthesiol & Reanimat, Baku, Azerbaijan
关键词
Artificial intelligence; ChatGPT; Gemini; Low back pain; Online medical information; Perplexity; RED FLAGS; INFORMATION; INSTRUMENT;
D O I
10.7717/peerj.18847
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Patients who are informed about the causes, pathophysiology, treatment and prevention of a disease are better able to participate in treatment procedures in the event of illness. Artificial intelligence (AI), which has gained popularity in recent years, is defined as the study of algorithms that provide machines with the ability to reason and perform cognitive functions, including object and word recognition, problem solving and decision making. This study aimed to examine the readability, reliability and quality of responses to frequently asked keywords about low back pain (LBP) given by three different AI-based chatbots (ChatGPT, Perplexity and Gemini), which are popular applications in online information presentation today. Methods: All three AI chatbots were asked the 25 most frequently used keywords related to LBP determined with the help of Google Trend. In order to prevent possible bias that could be created by the sequential processing of keywords in the answers given by the chatbots, the study was designed by providing input from different users (EO, VH) for each keyword. The readability of the responses given was determined with the Simple Measure of Gobbledygook (SMOG), Flesch Reading Ease Score (FRES) and Gunning Fog (GFG) readability scores. Quality was assessed using the Global Quality Score (GQS) and the Ensuring Quality Information for Patients (EQIP) score. Reliability was assessed by determining with DISCERN and Journal of American Medical Association (JAMA) scales. Results: The first three keywords detected as a result of Google Trend search were "Lower Back Pain", "ICD 10 Low Back Pain", and "Low Back Pain Symptoms". It was determined that the readability of the responses given by all AI chatbots was higher than the recommended 6th grade readability level (p < 0.001). In the EQIP, JAMA, modified DISCERN and GQS score evaluation, Perplexity was found to have significantly higher scores than other chatbots (p < 0.001). Conclusion: It has been determined that the answers given by AI chatbots to keywords about LBP are difficult to read and have low reliability and quality assessment. It is clear that when new chatbots are introduced, they can provide better guidance to patients with increased clarity and text quality. This study can provide inspiration for future studies on improving the algorithms and responses of AI chatbots.
引用
收藏
页数:18
相关论文
共 39 条
[1]   Cross-cultural adaptation and validation of the Global Physical Activity Questionnaire among healthy Hungarian adults [J].
Acs, Pongrac ;
Betlehem, Jozsef ;
Olah, Andras ;
Bergier, Barbara ;
Morvay-Sey, Kata ;
Makai, Alexandra ;
Premusz, Viktoria .
BMC PUBLIC HEALTH, 2020, 20 (Suppl 1)
[2]   Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study [J].
Arca, Dilek Omur ;
Erdemir, Ismail ;
Kara, Fevzi ;
Shermatov, Nurgazy ;
Odacioglu, Muruvvet ;
Ibisoglu, Emel ;
Hanci, Ferid Baran ;
Sagiroglu, Gonul ;
Hanci, Volkan .
MEDICINE, 2024, 103 (22) :E38352
[3]   Top 100 cited articles on ankylosing spondylitis [J].
Bagcier, F. ;
Yurdakul, O., V ;
Ozduran, E. .
REUMATISMO, 2020, 72 (04) :218-227
[4]   A systematic review of patient inflammatory bowel disease information resources on the world wide web [J].
Bernard, Andre ;
Langille, Morgan ;
Hughes, Stephanie ;
Rose, Caren ;
Leddin, Desmond ;
van Zanten, Sander Veldhuyzen .
AMERICAN JOURNAL OF GASTROENTEROLOGY, 2007, 102 (09) :2070-2077
[5]   Evaluation of Online AI-Generated Foot and Ankle Surgery Information [J].
Casciato, Dominick ;
Mateen, Sara ;
Cooperman, Steven ;
Pesavento, Danielle ;
Brandao, Roberto A. .
JOURNAL OF FOOT & ANKLE SURGERY, 2024, 63 (06) :680-683
[6]   DISCERN: an instrument for judging the quality of written consumer health information on treatment choices [J].
Charnock, D ;
Shepperd, S ;
Needham, G ;
Gann, R .
JOURNAL OF EPIDEMIOLOGY AND COMMUNITY HEALTH, 1999, 53 (02) :105-111
[7]   ChatGPT in the development of medical questionnaires. The example of the low back pain [J].
Coraci, Daniele ;
Maccarone, Maria Chiara ;
Regazzo, Gianluca ;
Accordi, Giorgia ;
Papathanasiou, Jannis, V ;
Masiero, Stefano .
EUROPEAN JOURNAL OF TRANSLATIONAL MYOLOGY, 2023, 33 (04)
[8]   ChatGPT and Patient Information in Nuclear Medicine: GPT-3.5 Versus GPT-4 [J].
Currie, Geoff ;
Robbie, Stephanie ;
Tually, Peter .
JOURNAL OF NUCLEAR MEDICINE TECHNOLOGY, 2023, 51 (04) :307-313
[9]   Red flags of low back pain [J].
DePalma, Michael G. .
JAAPA-JOURNAL OF THE AMERICAN ACADEMY OF PHYSICIAN ASSISTANTS, 2020, 33 (08) :8-11
[10]   The use of artificial intelligence in treating chronic back pain [J].
Do, Kenny ;
Kawana, Eric ;
Vachirakorntong, Benjamin ;
Do, Jenifer ;
Seibel, Ross .
KOREAN JOURNAL OF PAIN, 2023, 36 (04) :478-480