On the reliability of Large Language Models to misinformed and demographically informed prompts

被引:1
作者
Aremu, Toluwani [1 ]
Akinwehinmi, Oluwakemi [2 ]
Nwagu, Chukwuemeka [3 ]
Ahmed, Syed Ishtiaque [4 ]
Orji, Rita [3 ]
Del Amo, Pedro Arnau [2 ]
El Saddik, Abdulmotaleb [1 ,5 ]
机构
[1] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[2] Univ Lleida, CIMNE, Lleida, Spain
[3] Dalhousie Univ, Halifax, NS, Canada
[4] Univ Toronto, Toronto, ON, Canada
[5] Univ Ottawa, Ottawa, ON, Canada
关键词
D O I
10.1002/aaai.12208
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate and observe the behavior and performance of Large Language Model (LLM)-backed chatbots in addressing misinformed prompts and questions with demographic information within the domains of Climate Change and Mental Health. Through a combination of quantitative and qualitative methods, we assess the chatbots' ability to discern the veracity of statements, their adherence to facts, and the presence of bias or misinformation in their responses. Our quantitative analysis using True/False questions reveals that these chatbots can be relied on to give the right answers to these close-ended questions. However, the qualitative insights, gathered from domain experts, shows that there are still concerns regarding privacy, ethical implications, and the necessity for chatbots to direct users to professional services. We conclude that while these chatbots hold significant promise, their deployment in sensitive areas necessitates careful consideration, ethical oversight, and rigorous refinement to ensure they serve as a beneficial augmentation to human expertise rather than an autonomous solution. Dataset and assessment information can be found at .
引用
收藏
页数:15
相关论文
共 49 条
[1]  
Aremu T., 2023, SSRN Electronic Journal
[2]  
Baguio John Daves S., 2023, 2023 International Conference in Advances in Power, Signal, and Information Technology (APSIT), P688, DOI 10.1109/APSIT58554.2023.10201782
[3]  
Banerjee S., 2005, P ACL WORKSHOP INTRI, P65, DOI DOI 10.3115/1626355.1626389
[4]  
Bao Z., 2023, arXiv
[5]   Bridging the Gap Between Ethics and Practice: Guidelines for Reliable, Safe, and Trustworthy Human-centered AI Systems [J].
Ben Shneiderman .
ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2020, 10 (04)
[6]  
Bommasani R., 2021, OPPORTUNITIES RISKS, DOI DOI 10.48550/ARXIV.2108.07258
[7]  
Braun V., 2006, Qual Res Psychol, V3, P77, DOI [DOI 10.1191/1478088706QP063OA, 10.1191/1478088706qp063oa]
[8]  
Bulian J, 2024, Arxiv, DOI [arXiv:2310.02932, DOI 10.48550/ARXIV.2310.02932]
[9]  
Buolamwini J., 2018, C FAIRNESS ACCOUNTAB, P77, DOI DOI 10.2147/OTT.S126905
[10]   A Mental Health Chatbot for Regulating Emotions (SERMO)-Concept and Usability Test [J].
Denecke, Kerstin ;
Vaaheesan, Sayan ;
Arulnathan, Aaganya .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2021, 9 (03) :1170-1182