Standards for Belief Representations in LLMs

被引:0
作者
Herrmann, Daniel A. [1 ]
Levinstein, Benjamin A. [2 ]
机构
[1] Univ Groningen, Fac Philosophy, Groningen, Netherlands
[2] Univ Illinois Champaign Urbana, Champaign, IL USA
关键词
LLMs; Belief; Decision theory; Formal epistemology; AI; Radical Interpretation; Explainable AI; Interpretability;
D O I
10.1007/s11023-024-09709-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As large language models (LLMs) continue to demonstrate remarkable abilities across various domains, computer scientists are developing methods to understand their cognitive processes, particularly concerning how (and if) LLMs internally represent their beliefs about the world. However, this field currently lacks a unified theoretical foundation to underpin the study of belief in LLMs. This article begins filling this gap by proposing adequacy conditions for a representation in an LLM to count as belief-like. We argue that, while the project of belief measurement in LLMs shares striking features with belief measurement as carried out in decision theory and formal epistemology, it also differs in ways that should change how we measure belief. Thus, drawing from insights in philosophy and contemporary practices of machine learning, we establish four criteria that balance theoretical considerations with practical constraints. Our proposed criteria include accuracy, coherence, uniformity, and use, which together help lay the groundwork for a comprehensive understanding of belief representation in LLMs. We draw on empirical work showing the limitations of using various criteria in isolation to identify belief representations.
引用
收藏
页数:25
相关论文
共 71 条
  • [1] Abdou M, 2021, Arxiv, DOI [arXiv:2109.06129, DOI 10.48550/ARXIV.2109.06129]
  • [2] Alain G, 2018, Arxiv, DOI arXiv:1610.01644
  • [3] [Anonymous], 1990, The Logic of Decision
  • [4] Azaria A., 2023, The internal state of an llm knows when it's lying
  • [5] Bai YT, 2022, Arxiv, DOI [arXiv:2212.08073, DOI 10.48550/ARXIV.2212.08073]
  • [6] BenDavid S., 2014, Understanding Machine Learning: From Theory to Algorithms
  • [7] Bender E. M., 2020, P 58 ANN M ASS COMP, DOI [10.18653/v1/2020.acl-main.463, DOI 10.18653/V1/2020.ACL-MAIN.463]
  • [8] On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
    Bender, Emily M.
    Gebru, Timnit
    McMillan-Major, Angelina
    Shmitchell, Shmargaret
    [J]. PROCEEDINGS OF THE 2021 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2021, 2021, : 610 - 623
  • [9] Bricken T., 2023, Transformer Circuits Thread
  • [10] Brier G. W., 1950, Monthly Weather Rev., V78, P1, DOI DOI 10.1175/1520-0493(1950)0782.0.CO