Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic

被引:3
作者
Sallam, Malik [1 ,2 ,7 ]
Al-Mahzoum, Kholoud [3 ]
Alshuaib, Omaima [3 ]
Alhajri, Hawajer [3 ]
Alotaibi, Fatmah [3 ]
Alkhurainej, Dalal [3 ]
Al-Balwah, Mohammad Yahya [3 ]
Barakat, Muna [4 ,5 ]
Egger, Jan [6 ]
机构
[1] Univ Jordan, Sch Med, Dept Pathol Microbiol & Forens Med, Amman 11942, Jordan
[2] Lund Univ, Fac Med, Dept Translat Med, S-22184 Malmo, Sweden
[3] Univ Jordan, Sch Med, Amman 11942, Jordan
[4] Appl Sci Private Univ, Fac Pharm, Dept Clin Pharm & Therapeut, Amman 11931, Jordan
[5] Middle East Univ, MEU Res Unit, Amman 11831, Jordan
[6] Univ Med Essen AoR, Inst AI Med IKIM, Essen, Germany
[7] Jordan Univ Hosp, Dept Clin Labs & Forens Med, Queen Rania Al Abdullah St Aljubeiha,POB 13046, Amman, Jordan
关键词
AI chatbots; Infectious diseases; Language performance; Healthcare technology; Digital health queries; HEALTH INFORMATION; CHATGPT; CARE;
D O I
10.1186/s12879-024-09725-y
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
BackgroundAssessment of artificial intelligence (AI)-based models across languages is crucial to ensure equitable access and accuracy of information in multilingual contexts. This study aimed to compare AI model efficiency in English and Arabic for infectious disease queries.MethodsThe study employed the METRICS checklist for the design and reporting of AI-based studies in healthcare. The AI models tested included ChatGPT-3.5, ChatGPT-4, Bing, and Bard. The queries comprised 15 questions on HIV/AIDS, tuberculosis, malaria, COVID-19, and influenza. The AI-generated content was assessed by two bilingual experts using the validated CLEAR tool.ResultsIn comparing AI models' performance in English and Arabic for infectious disease queries, variability was noted. English queries showed consistently superior performance, with Bard leading, followed by Bing, ChatGPT-4, and ChatGPT-3.5 (P = .012). The same trend was observed in Arabic, albeit without statistical significance (P = .082). Stratified analysis revealed higher scores for English in most CLEAR components, notably in completeness, accuracy, appropriateness, and relevance, especially with ChatGPT-3.5 and Bard. Across the five infectious disease topics, English outperformed Arabic, except for flu queries in Bing and Bard. The four AI models' performance in English was rated as "excellent", significantly outperforming their "above-average" Arabic counterparts (P = .002).ConclusionsDisparity in AI model performance was noticed between English and Arabic in response to infectious disease queries. This language variation can negatively impact the quality of health content delivered by AI models among native speakers of Arabic. This issue is recommended to be addressed by AI developers, with the ultimate goal of enhancing health outcomes.
引用
收藏
页数:13
相关论文
共 28 条
  • [21] The performance of artificial intelligence language models in board-style dental knowledge assessment A preliminary study on ChatGPT
    Danesh, Arman
    Pazouki, Hirad
    Danesh, Kasra
    Danesh, Farzad
    Danesh, Arsalan
    JOURNAL OF THE AMERICAN DENTAL ASSOCIATION, 2023, 154 (11) : 970 - 974
  • [22] Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing
    Makrygiannakis, Miltiadis A.
    Giannakopoulos, Kostis
    Kaklamanos, Eleftherios G.
    EUROPEAN JOURNAL OF ORTHODONTICS, 2024,
  • [23] Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study
    Lee, You-Qian
    Chen, Ching-Tai
    Chen, Chien-Chang
    Lee, Chung-Hong
    Chen, Peitsz
    Wu, Chi-Shin
    Dai, Hong-Jie
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [24] Evaluating the Performance of Artificial Intelligence-Based Large Language Models in Orthodontics-A Systematic Review and Meta-Analysis
    Albalawi, Farraj
    Khanagar, Sanjeev B.
    Iyer, Kiran
    Alhazmi, Nora
    Alayyash, Afnan
    Alhazmi, Anwar S.
    Awawdeh, Mohammed
    Singh, Oinam Gokulchandra
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [25] Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education
    Sabri, Hamoun
    Saleh, Muhammad H. A.
    Hazrati, Parham
    Merchant, Keith
    Misch, Jonathan
    Kumar, Purnima S.
    Wang, Hom-Lay
    Barootchi, Shayan
    JOURNAL OF PERIODONTAL RESEARCH, 2025, 60 (02) : 121 - 133
  • [26] The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease
    Huo, Bright
    Calabrese, Elisa
    Sylla, Patricia
    Kumar, Sunjay
    Ignacio, Romeo C.
    Oviedo, Rodolfo
    Hassan, Imran
    Slater, Bethany J.
    Kaiser, Andreas
    Walsh, Danielle S.
    Vosburg, Wesley
    SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2024, 38 (05): : 2320 - 2330
  • [27] Exploring the potential of large language models in identifying metabolic dysfunction-associated steatotic liver disease: A comparative study of non-invasive tests and artificial intelligence-generated responses
    Wu, Wanying
    Guo, Yuhu
    Li, Qi
    Jia, Congzhuo
    LIVER INTERNATIONAL, 2025, 45 (04)
  • [28] Can Multimodal Large Language Models Enhance Performance Benefits Among Higher Education Students? An Investigation Based on the Task-Technology Fit Theory and the Artificial Intelligence Device Use Acceptance Model
    Al-Dokhny, Amany
    Alismaiel, Omar
    Youssif, Samia
    Nasr, Nermeen
    Drwish, Amr
    Samir, Amira
    SUSTAINABILITY, 2024, 16 (23)