Evaluating Large Language Models for Tax Law Reasoning

被引:0
作者
Cavalcante Presa, Joao Paulo [1 ]
Camilo Junior, Celso Goncalves [1 ]
Teles de Oliveira, Savio Salvarino [1 ]
机构
[1] Fed Univ Goias UFG, Goiania, Go, Brazil
来源
INTELLIGENT SYSTEMS, BRACIS 2024, PT I | 2025年 / 15412卷
关键词
Legal Reasoning; Large Language Models (LLMs); Legal Question Answering; Tax Law;
D O I
10.1007/978-3-031-79029-4_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ability to reason over laws is essential for legal professionals, facilitating interpreting and applying legal principles to complex real-world situations. Tax laws are crucial for funding government functions and shaping economic behavior, yet their interpretation poses challenges due to their complexity, constant evolution, and susceptibility to differing viewpoints. Large Language Models (LLMs) show considerable potential in supporting this reasoning process by processing extensive legal texts and generating relevant information. This study evaluates the performance of LLMs in legal reasoning within the domain of tax law for legal entities, utilizing a dataset of real-world questions and expert answers in Brazilian Portuguese. We employed quantitative metrics (BLEU, ROUGE) and qualitative assessment using a solid LLM to ensure factual accuracy and relevance. A novel dataset was curated, comprising genuine questions from legal entities in tax law, answered by legal experts with corresponding legal texts. The evaluation includes both open-source and proprietary LLMs, providing a assessment of their effectiveness in legal reasoning tasks. The strong correlation between robust LLM evaluator metric and Bert Score F1 suggests these metrics effectively capture semantic aspects pertinent to human-perceived quality.
引用
收藏
页码:460 / 474
页数:15
相关论文
共 41 条
  • [1] Exploring the state of the art in legal QA systems
    Abdallah, Abdelrahman
    Piryani, Bhawna
    Jatowt, Adam
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [2] Achiam Josh, 2023, Gpt-4 technical report, DOI DOI 10.48550/ARXIV.2303.08774
  • [3] AI@Meta, 2024, Llama 3 Model Card
  • [4] Ainslie J, 2023, 2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, P4895
  • [5] [Anonymous], 2018, Traffic Inj Prev, V19, P664, DOI [10.1080/17476348.2018.1471191, 10.1080/13543776.2018.1502972, 10.1080/13607863.2018.1500775, 10.1080/21678421.2018.1445135, 10.1080/1744666X.2018.1516971, 10.1080/13548506.2018.1450078, 10.1080/15376516.2018.1486560, 10.1080/13696998.2018.1484647, 10.1080/14992027.2018.1470739, 10.1080/02699052.2018.1465232, 10.1080/15592324.2018.1449379, 10.1080/00107514.2018.1494937, 10.1080/17518423.2018.1489988, 10.1080/1744666X.2018.1460543, 10.1080/0954
  • [6] [Anonymous], 2024, Qwen2 blog
  • [7] Cui JX, 2024, Arxiv, DOI [arXiv:2306.16092, DOI 10.48550/ARXIV.2306.16092]
  • [8] Dai YF, 2024, Arxiv, DOI arXiv:2310.05620
  • [9] Du Y, 2024, Arxiv, DOI arXiv:2402.04253
  • [10] Fei ZW, 2023, Arxiv, DOI arXiv:2309.16289