Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models' Explanations (Student Abstract)

被引：0

作者：

Kuo, Mu-Tien ^{[1
,2
]}

Hsueh, Chih-Chung ^{[1
,2
]}

Tsai, Richard Tzong-Han ^{[2
,3
]}

机构：

[1] Chingshin Acad, Taipei, Taiwan

[2] Acad Sinica, Res Ctr Humanities & Social Sci, Taipei, Taiwan

[3] Natl Cent Univ, Dept Comp Sci & Engn, Taoyuan, Taiwan

来源：

THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As Large Language Models (LLMs) become more prevalent in various fields, it is crucial to rigorously assess the quality of their explanations. Our research introduces a task-agnostic framework for evaluating free-text rationales, drawing on insights from both linguistics and machine learning. We evaluate two dimensions of explainability: fidelity and interpretability. For fidelity, we propose methods suitable for proprietary LLMs where direct introspection of internal features is unattainable. For interpretability, we use language models instead of human evaluators, addressing concerns about subjectivity and scalability in evaluations. We apply our framework to evaluate GPT-3.5 and the impact of prompts on the quality of its explanations. In conclusion, our framework streamlines the evaluation of explanations from LLMs, promoting the development of safer models.

引用

页码：23554 / 23555

页数：2

共 11 条

[1] Rating Ease of Readability using Transformers
Alaparthi, Varun Sai
Pawar, Ajay Abhaysing
Suneera, C. M.
Prakash, Jay
[J]. 2022 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2022), 2022, : 117 - 121
[2] Atanasova P, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P3256
[3] Clinciu MA, 2021, 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), P2376
[4] Dua Dheeru, 2019, P NAACL
[5] Explaining Explanations: An Overview of Interpretability of Machine Learning
Gilpin, Leilani H.
Bau, David
Yuan, Ben Z.
Bajwa, Ayesha
Specter, Michael
Kagal, Lalana
[J]. 2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, : 80 - 89
[6] Jacovi Alon, 2020, ACL
[7] Lertvittayakumjorn P, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P5195
[8] The learned interpretation of cognitive fluency
Unkelbach, C
[J]. PSYCHOLOGICAL SCIENCE, 2006, 17 (04) : 339 - 345
[9] Wiegreffe S., 2021, EMNI
[10] Wiegreffe Sarah, 2022, NAACL

← 1 2 →