The performance of arti fi cial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard

被引:15
作者
Daraqel, Baraa [1 ,2 ,3 ,4 ]
Wafaie, Khaled [5 ]
Mohammed, Hisham [4 ]
Cao, Li [1 ,2 ,3 ]
Mheissen, Samer [5 ]
Liu, Yang [1 ,2 ,3 ]
Zheng, Leilei [1 ,2 ,3 ]
机构
[1] Chongqing Med Univ, Stomatol Hosp, Dept Orthodont, 426 Songshibei Rd, Chongqing 401147, Peoples R China
[2] Chongqing Med Univ, Chongqing Key Lab Oral Dis & Biomed Sci, Chongqing, Peoples R China
[3] Chongqing Med Univ, Chongqing Municipal Key Lab Oral Biomed Engn Highe, Chongqing, Peoples R China
[4] Al Quds Univ, Oral Hlth Res & Promot Unit, Jerusalem, Palestine
[5] Zhengzhou Univ, Affiliated Hosp 1, Fac Dent, Dept Orthodont, Zhengzhou, Henan, Peoples R China
关键词
ARTIFICIAL-INTELLIGENCE; INTERNET;
D O I
10.1016/j.ajodo.2024.01.012
中图分类号
R78 [口腔科学];
学科分类号
1003 ;
摘要
Introduction: This study aimed to evaluate and compare the performance of 2 artificial intelligence (AI) models, Chat Generative Pretrained Transformer -3.5 (ChatGPT-3.5; OpenAI, San Francisco, Calif) and Google Bidirectional Encoder Representations from Transformers (Google Bard; Bard Experiment, Google, Mountain View, Calif), in terms of response accuracy, completeness, generation time, and response length when answering general orthodontic questions. Methods: A team of orthodontic specialists developed a set of 100 questions in 10 orthodontic domains. One author submitted the questions to both ChatGPT and Google Bard. The AI-generated responses from both models were randomly assigned into 2 forms and sent to 5 blinded and independent assessors. The quality of AI-generated responses was evaluated using a newly developed tool for accuracy of information and completeness. In addition, response generation time and length were recorded. Results: The accuracy and completeness of responses were high in both AI models. The median accuracy score was 9 (interquartile range [IQR]: 8-9) for ChatGPT and 8 (IQR: 8-9) for Google Bard (Median difference: 1; P \0.001). The median completeness score was similar in both models, with 8 (IQR: 8-9) for ChatGPT and 8 (IQR: 7-9) for Google Bard. The odds of accuracy and completeness were higher by 31% and 23% in ChatGPT than in Google Bard. Google Bard's response generation time was significantly shorter than that of ChatGPT by 10.4 second/question. However, both models were similar in terms of response length generation. Conclusions: Both ChatGPT and Google Bard generated responses were rated with a high level of accuracy and completeness to the posed general orthodontic questions. However, acquiring answers was generally faster using the Google Bard model. (Am J Orthod Dentofacial Orthop 2024;165:652-62)
引用
收藏
页码:652 / 662
页数:11
相关论文
共 51 条
[1]   Social media use among orthodontic professionals: Present and future [J].
Abu Arqub, Sarah ;
Al-Moghrabi, Dalya ;
Alkadhimi, Aslam ;
Fleming, Padhraig S. .
SEMINARS IN ORTHODONTICS, 2023, 29 (04) :342-345
[2]  
Adamopoulou E., 2020, IFIP INT C ART INT A, V584, P373, DOI [10.1007/978-3-030-49186-4_31, DOI 10.1007/978-3-030-49186-4_31]
[3]  
Ademiluyi G, 2003, PATIENT EDUC COUNS, V50, P151
[4]  
Ali Rohaid, 2023, Neurosurgery, V93, P1090, DOI 10.1227/neu.0000000000002551
[5]   The nature and accuracy of Instagram posts concerning marketed orthodontic products: A cross-sectional analysis [J].
Alkadhimi, Aslam ;
Al-Moghrabi, Dalya ;
Fleming, Padhraig S. .
ANGLE ORTHODONTIST, 2022, 92 (02) :247-254
[6]   The quality and content of websites in the UK advertising aligner therapy: are standards being met? [J].
Alsaqabi, Farah ;
Madadian, Matin Ali ;
Pandis, Nikolaos ;
Cobourne, Martyn T. ;
Seehra, Jadbinder .
BRITISH DENTAL JOURNAL, 2023,
[7]   A Survey of Mobile VPN Technologies [J].
Alshalan, Abdullah ;
Pisharody, Sandeep ;
Huang, Dijiang .
IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2016, 18 (02) :1177-1196
[8]  
Alyusuf Raja H, 2013, J Pathol Inform, V4, P29, DOI 10.4103/2153-3539.120729
[9]  
[Anonymous], 2010, Web intelligence and intelligent agents
[10]  
Arun M, 2017, J ORTHOD, V44, P82, DOI 10.1080/14653125.2017.1313546